中文電影對話問答系統資料集

Shang-Bao Luo; Cheng-Chung Fan; Kuan-Yu Chen; Yu Tsao; Hsin-Min Wang; Keh-Yih Su

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	中文電影對話問答系統資料集
並列篇名	Chinese Movie Dialogue Question Answering Dataset
作者	Shang-Bao Luo (Shang-Bao Luo)、Cheng-Chung Fan (Cheng-Chung Fan)、Kuan-Yu Chen (Kuan-Yu Chen)、Yu Tsao (Yu Tsao)、Hsin-Min Wang (Hsin-Min Wang)、Keh-Yih Su (Keh-Yih Su)
中文摘要	本論文建構一個中文對話式問答資料集CMDQA。內容為中文電影資訊獲取的多輪對話場景，總共包含一萬筆對話，共約四萬輪對話。所有問題與背景文檔，皆由網路爬蟲從維基百科彙整而來。每個問題的答案都是其相關文檔內的某個片段。此外，為了模擬真實對話問答的情景，對話中會有代名詞的使用。因此，在CMDQA中，問答模型除了需自動地檢索相關文檔外，亦需處理代名詞與歷史資訊的問題。除了對話式多輪問答外，本資料集還可用於評估資訊檢索、機器閱讀理解與問題轉寫等任務的模型成效。除了CMDQA以外，本研究提供一個基礎系統並測試其效果。實驗顯示，基礎系統的效能與真人尚有相當大的差異，因此本資料集可對相關研究提供足夠的挑戰性。
英文摘要	This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.
起訖頁	7-14
關鍵詞	資訊獲取問答系統、對話式問答系統資料集、中文電影問答、Information-Seeking Question Answering、Dialogue-based Question Answering Dataset、Chinese Movie QA
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	探討語者驗證系統中特徵處理模組與注意力機制
該期刊-下一篇	探討語者驗證系統中特徵處理模組與注意力機制

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱