  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Chinese Spelling Checker Based on Statistical Machine Translation
作者 邱絢紋吳鑑城張俊盛
Chinese spell check is an important component for many NLP applications, including word processors, search engines, and automatic essay rating. However, compared to spell checkers for alphabetical languages(e.g., English or French) , Chinese spell checkers are more difficult to develop, because there are no word boundaries in Chinese writing system, and errors may be caused by various Chinese input methods. Chinese spell check involves automatically detecting and correcting typos, roughly corresponding to misspelled words in English. Liu et al.(2011) show that people tend to unintentionally generate typos that sound similar(e.g., *措折[cuo zhe] and挫折[cuo zhe]) , or look alike(e.g., *固難[gu nan] and困難[kun nan]) . The methods for spell check can be broadly classified into two types: rule-based methods(Ren et al., 2001; Jiang et al., 2012) and statistical methods(Hung & Wu, 2009; Chen, 2010) . Rule-based methods use knowledge resources such as a dictionary to identify a word as a typo. Statistical methods tend to use a large monolingual corpus to create a language model tovalidate the correction hypotheses. Consider the sentence“心是很重要的。”[xin shi hen zhong yao de] which is correct. However,“心”and“是”are likely to be regarded as an error by a rule-based model for the word“心事”with identical pronunciation. In statistical methods,“心”and“是”are a bigram which has high frequency in a monolingual corpus, so we may determine that“心是”is not a typo after all. In this paper, we propose a model that combines rule-based and statistical approaches. Probable errors, proposed by the rule-based detection module, are verified using statistical machine translation(SMT) model. Our model treats spell check and correction as a kind of translation, where typos are translated into correctly spelled words according to the translation probability and the language model probability.
起訖頁 53-55
刊名 ROCLING論文集  
期數 2013 (2013期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 基於Sphinx可快速個人化行動數字語音辨識系統
該期刊-下一篇 Detecting English Grammatical Errors based on Machine Translation




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄