Chinese Movie Dialogue Question Answering Dataset
作者 Shang-Bao Luo (Shang-Bao Luo)Cheng-Chung Fan (Cheng-Chung Fan)Kuan-Yu Chen (Kuan-Yu Chen)Yu Tsao (Yu Tsao)Hsin-Min Wang (Hsin-Min Wang)Keh-Yih Su (Keh-Yih Su)
This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.
起訖頁 7-14
關鍵詞 資訊獲取問答系統對話式問答系統資料集中文電影問答Information-Seeking Question AnsweringDialogue-based Question Answering DatasetChinese Movie QA
刊名 ROCLING論文集  
期數 202212 (2022期)
出版單位 中華民國計算語言學學會
