  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。

Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
作者 Liang-Chih Yu (Liang-Chih Yu)Chung-Hsien Wu (Chung-Hsien Wu)Jui-Feng Yeh (Jui-Feng Yeh)Eduard Hovy (Eduard Hovy)
Word sense annotated corpora are useful resources for many text mining applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.
起訖頁 405-419
關鍵詞 Corpus CleanupWord Sense DisambiguationSemantic AnalysisEntropy
刊名 中文計算語言學期刊  
期數 200812 (13:4期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces
該期刊-下一篇 Hierarchical Taxonomy Integration Using Semantic Feature Expansion on Category-Specific Terms




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄