機器翻譯為本的中文拼字改錯系統

邱絢紋; 吳鑑城; 張俊盛

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	機器翻譯為本的中文拼字改錯系統
並列篇名	Chinese Spelling Checker Based on Statistical Machine Translation
作者	邱絢紋、吳鑑城、張俊盛
英文摘要	Chinese spell check is an important component for many NLP applications, including word processors, search engines, and automatic essay rating. However, compared to spell checkers for alphabetical languages(e.g., English or French) , Chinese spell checkers are more difficult to develop, because there are no word boundaries in Chinese writing system, and errors may be caused by various Chinese input methods. Chinese spell check involves automatically detecting and correcting typos, roughly corresponding to misspelled words in English. Liu et al.(2011) show that people tend to unintentionally generate typos that sound similar(e.g., 措折[cuo zhe] and挫折[cuo zhe]) , or look alike(e.g., 固難[gu nan] and困難[kun nan]) . The methods for spell check can be broadly classified into two types: rule-based methods(Ren et al., 2001; Jiang et al., 2012) and statistical methods(Hung & Wu, 2009; Chen, 2010) . Rule-based methods use knowledge resources such as a dictionary to identify a word as a typo. Statistical methods tend to use a large monolingual corpus to create a language model tovalidate the correction hypotheses. Consider the sentence“心是很重要的。”[xin shi hen zhong yao de] which is correct. However,“心”and“是”are likely to be regarded as an error by a rule-based model for the word“心事”with identical pronunciation. In statistical methods,“心”and“是”are a bigram which has high frequency in a monolingual corpus, so we may determine that“心是”is not a typo after all. In this paper, we propose a model that combines rule-based and statistical approaches. Probable errors, proposed by the rule-based detection module, are verified using statistical machine translation(SMT) model. Our model treats spell check and correction as a kind of translation, where typos are translated into correctly spelled words according to the translation probability and the language model probability.
起訖頁	53-55
刊名	ROCLING論文集
期數	2013 (2013期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	基於Sphinx可快速個人化行動數字語音辨識系統
該期刊-下一篇	Detecting English Grammatical Errors based on Machine Translation