The Properties and Further Applications of Chinese Frequent Strings

Lin, Yih-jeng; Yu, Ming-shing

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	The Properties and Further Applications of Chinese Frequent Strings
作者	Lin, Yih-jeng (Lin, Yih-jeng)、Yu, Ming-shing (Yu, Ming-shing)
中文摘要	This paper reveals some important properties of CFSs and applications in Chinese natural language processing (NLP). We have previously proposed a method for extracting Chinese frequent strings that contain unknown words from a Chinese corpus [Lin and Yu 2001]. We found that CFSs contain many 4-character strings, 3-word strings, and longer n-grams. Such information can only be derived from an extremely large corpus using a traditional language model(LM). In contrast to using a traditional LM, we can achieve high precision and efficiency by using CFSs to solve Chinese toneless phoneme-to-character conversion and to correct Chinese spelling errors with a small training corpus. An accuracy rate of 92.86% was achieved for Chinese toneless phoneme-to-character conversion, and an accuracy rate of 87.32% was achieved for Chinese spelling error correction. We also attempted to assign syntactic categories to a CFS. The accuracy rate for assigning syntactic categories to the CFSs was 88.53% for outside testing when the syntactic categories of the highest level were used.
起訖頁	113-128
關鍵詞	Chinese frequent strings、Unknown words、Chinese toneless phoneme-to-character、Chinese spelling error correction、Language model
刊名	中文計算語言學期刊
期數	200402 (9:1期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	基於術語抽取與術語叢集技術的主題抽取