英文摘要 |
Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW conversion effectively for Chinese language texts. It is designed as a support system with Chinese input systems. In this paper, five types of meaningful word-pairs are investigated, namely: noun-verb (NV), noun-noun (NN), verb-verb (VV), adjective-noun (AN) and adverb-verb (DV). The pre-collected datasets of meaningful word-pairs are based on our previous work auto-generation of NVEF knowledge in Chinese (AUTO-NVEF) [30, 32], where NVEF stands for noun-verb event frame. The main purpose of this study is to illustrate that a hybrid approach of combining statistical language modeling (SLM) with contextual information, such as meaningful word-pairs, is effective for improving syllable-to-word systems and is important for syllable/speech understanding. Our experiments show the following: (1) the MWP identifier achieves tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.69% and 90.7%, respectively, among the identified word-pairs for the test syllables; (2) by STW error analysis, we find that the major critical problem of tonal STW systems is the failure of homonym disambiguation (52%), while that of toneless STW systems is inadequate syllable segmentation (48%); (3) by applying the MWP identifier, together with the Microsoft input method editor (MSIME 2003) and an optimized bigram model (BiGram), the tonal and toneless STW improvements of the two STW systems are 25.25%/21.82% and 12.87%/15.62%, respectively. |