月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
The Properties and Further Applications of Chinese Frequent Strings
作者 Lin, Yih-jeng (Lin, Yih-jeng)Yu, Ming-shing (Yu, Ming-shing)
中文摘要
This paper reveals some important properties of CFSs and applications in Chinese natural language processing (NLP). We have previously proposed a method for extracting Chinese frequent strings that contain unknown words from a Chinese corpus [Lin and Yu 2001]. We found that CFSs contain many 4-character strings, 3-word strings, and longer n-grams. Such information can only be derived from an extremely large corpus using a traditional language model(LM). In contrast to using a traditional LM, we can achieve high precision and efficiency by using CFSs to solve Chinese toneless phoneme-to-character conversion and to correct Chinese spelling errors with a small training corpus. An accuracy rate of 92.86% was achieved for Chinese toneless phoneme-to-character conversion, and an accuracy rate of 87.32% was achieved for Chinese spelling error correction. We also attempted to assign syntactic categories to a CFS. The accuracy rate for assigning syntactic categories to the CFSs was 88.53% for outside testing when the syntactic categories of the highest level were used.
起訖頁 113-128
關鍵詞 Chinese frequent stringsUnknown wordsChinese toneless phoneme-to-characterChinese spelling error correctionLanguage model
刊名 中文計算語言學期刊  
期數 200402 (9:1期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 基於術語抽取與術語叢集技術的主題抽取
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄