Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation

Liang-Chih Yu; Chung-Hsien Wu; Jui-Feng Yeh; Eduard Hovy

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
作者	Liang-Chih Yu (Liang-Chih Yu)、Chung-Hsien Wu (Chung-Hsien Wu)、Jui-Feng Yeh (Jui-Feng Yeh)、Eduard Hovy (Eduard Hovy)
中文摘要	Word sense annotated corpora are useful resources for many text mining applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.
起訖頁	405-419
關鍵詞	Corpus Cleanup、Word Sense Disambiguation、Semantic Analysis、Entropy
刊名	中文計算語言學期刊
期數	200812 (13:4期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces
該期刊-下一篇	Hierarchical Taxonomy Integration Using Semantic Feature Expansion on Category-Specific Terms