  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
作者 Jia-Lin Tsai (Jia-Lin Tsai)Tien-Jien Chiang (Tien-Jien Chiang)Wen-Lian Hsu
Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW conversion effectively for Chinese language texts. It is designed as a support system with Chinese input systems. In this paper, five types of meaningful word-pairs are investigated, namely: noun-verb (NV), noun-noun (NN), verb-verb (VV), adjective-noun (AN) and adverb-verb (DV). The pre-collected datasets of meaningful word-pairs are based on our previous work auto-generation of NVEF knowledge in Chinese (AUTO-NVEF) [30, 32], where NVEF stands for noun-verb event frame. The main purpose of this study is to illustrate that a hybrid approach of combining statistical language modeling (SLM) with contextual information, such as meaningful word-pairs, is effective for improving syllable-to-word systems and is important for syllable/speech understanding. Our experiments show the following: (1) the MWP identifier achieves tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.69% and 90.7%, respectively, among the identified word-pairs for the test syllables; (2) by STW error analysis, we find that the major critical problem of tonal STW systems is the failure of homonym disambiguation (52%), while that of toneless STW systems is inadequate syllable segmentation (48%); (3) by applying the MWP identifier, together with the Microsoft input method editor (MSIME 2003) and an optimized bigram model (BiGram), the tonal and toneless STW improvements of the two STW systems are 25.25%/21.82% and 12.87%/15.62%, respectively.
起訖頁 1-10
關鍵詞 syllable-to-wordcontextual informationtop-down identifiern-gram model
刊名 ROCLING論文集  
期數 2004 (2004期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 以自組織映射圖進行計算語言學領域術語視覺化之研究
該期刊-下一篇 語料庫統計值與全球資訊網統計值之比較:以中文斷詞應用為例




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄