Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

Jia-Lin Tsai; Tien-Jien Chiang; Wen-Lian Hsu

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
並列篇名	Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
作者	Jia-Lin Tsai (Jia-Lin Tsai)、Tien-Jien Chiang (Tien-Jien Chiang)、Wen-Lian Hsu (Wen-Lian Hsu)
英文摘要	Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW conversion effectively for Chinese language texts. It is designed as a support system with Chinese input systems. In this paper, five types of meaningful word-pairs are investigated, namely: noun-verb (NV), noun-noun (NN), verb-verb (VV), adjective-noun (AN) and adverb-verb (DV). The pre-collected datasets of meaningful word-pairs are based on our previous work auto-generation of NVEF knowledge in Chinese (AUTO-NVEF) [30, 32], where NVEF stands for noun-verb event frame. The main purpose of this study is to illustrate that a hybrid approach of combining statistical language modeling (SLM) with contextual information, such as meaningful word-pairs, is effective for improving syllable-to-word systems and is important for syllable/speech understanding. Our experiments show the following: (1) the MWP identifier achieves tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.69% and 90.7%, respectively, among the identified word-pairs for the test syllables; (2) by STW error analysis, we find that the major critical problem of tonal STW systems is the failure of homonym disambiguation (52%), while that of toneless STW systems is inadequate syllable segmentation (48%); (3) by applying the MWP identifier, together with the Microsoft input method editor (MSIME 2003) and an optimized bigram model (BiGram), the tonal and toneless STW improvements of the two STW systems are 25.25%/21.82% and 12.87%/15.62%, respectively.
起訖頁	1-10
關鍵詞	syllable-to-word、contextual information、top-down identifier、n-gram model
刊名	ROCLING論文集
期數	2004 (2004期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	以自組織映射圖進行計算語言學領域術語視覺化之研究
該期刊-下一篇	語料庫統計值與全球資訊網統計值之比較：以中文斷詞應用為例

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱