月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Chinese Word Auto-Confirmation Agent
並列篇名
Chinese Word Auto-Confirmation Agent
作者 Jia-Lin Tsai (Jia-Lin Tsai)Cheng-Lung Sung (Cheng-Lung Sung)Wen-Lian Hsu
英文摘要
In various Asian languages, including Chinese, there is no space between words in texts. Thus, most Chinese NLP systems must perform word-segmentation (sentence tokenization). However, successful word-segmentation depends on having a suffi-ciently large lexicon. On the average, about 3% of the words in text are not contained in a lexicon. Therefore, unknown word identification becomes a bottleneck for Chi-nese NLP systems. In this paper, we present a Chinese word auto-confirmation (CWAC) agent. CWAC agent uses a hybrid approach that takes advantage of statistical and linguistic approaches. The task of a CWAC agent is to auto-confirm whether an n-gram input (n≥2) is a Chinese word. We design our CWAC agent to satisfy two criteria: (1) a greater than 98% precision rate and a greater than 75% recall rate and (2) do-main-independent performance (F-measure). These criteria assure our CWAC agents can work automatically without human intervention. Furthermore, by combining sev-eral CWAC agents designed based on different principles, we can construct a multi-CWAC agent through a building-block approach. Three experiments are conducted in this study. The results demonstrate that, for n-gram frequency≥4 in large corpus, our CWAC agent can satisfy the two criteria and achieve 97.82% precision, 77.11% recall, and 86.24% domain-independent F-measure. No existing systems can achieve such a high precision and do-main-independent F-measure. The proposed method is our first attempt for constructing a CWAC agent. We will continue develop other CWAC agents and integrating them into a multi-CWAC agent system.
起訖頁 1-17
關鍵詞 natural language processingword segmentationunknown wordagent
刊名 ROCLING論文集  
期數 2003 (2003期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Reliable and Cost-Effective PoS-Tagging
該期刊-下一篇 Mencius: A Chinese Named Entity Recognizer Using Hybrid Model
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄