月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
STATISTICAL MODELS- FOR WORD SEGMENTATION AND UNKNOWN WORD RESOLUTION
作者 Tung-Hui Chiang (Tung-Hui Chiang)ling-Shin Chang (ling-Shin Chang)Ming-Yu Lin (Ming-Yu Lin)Keh-Yih Su (Keh-Yih Su)
英文摘要
In a Chinese sentence, there are no word delimiters, like blanks, between the 'words'. Therefore, it is important to identify the word boundaries before processing Chinese text. Traditional approaches tend to use dictionary lookup, morphological rules and heuristics to identify the word boundaries. Such approaches may not be applied to a large system due to the complicated linguistic phenomena involved in Chinese morphology and syntax. In this paper, the various available features in a sentence are used to construct a generalized word segmentation model; the various probabilistic models for word segmentation are then derived based on the generalized model. In general, the likelihood measure adopted in a probabilistic model does not provide a scoring mechanism that directly indicates the real ranks of the various candidate segmentation patterns. To enhance the baseline models, a robust adaptive learning algorithm is proposed to adjust the parameters of the baseline models so as to increase the discrimination power and robustness of the models. The simulation shows that cost-effective word segmentation could be achieved under various contexts with the proposed models. It is possible to achieve accuracy in word recognition rate of 99.39% and sentence recognition rate of 97.65% in the testing corpus by incorporating word length information to a context-independent word model and applying a robust adaptive learning algorithm in the segmentation process. Since not all lexical items could be found in the system dictionary in real applications, the performance of most word segmentation methods in the literature may degraded significantly when unknown. words are encountered. Such an 'unknown word problem' is also examined in this paper. An error recovery mechanism based on the segmentation model is proposed. Preliminary experiments show that the error rates introduced by unknown words could be reduced significantly.
起訖頁 123-146
刊名 ROCLING論文集  
期數 1992 (1992期)
出版單位 國立高雄師範大學輔導與諮商研究所
該期刊-上一篇 Acquisition of Unbounded Dependency Using Explanation-Based Learning
該期刊-下一篇 A Modular and Statistical Approach to Machine Translation
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄