A Study on Chinese Spelling Check Using Confusion Sets and N-gram Statistics

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	A Study on Chinese Spelling Check Using Confusion Sets and N-gram Statistics
作者	Chuan-Jie Lin (Chuan-Jie Lin)、Wei-Cheng Chu (Wei-Cheng Chu)
英文摘要	This paper proposes an automatic method to build a Chinese spelling check system. Confusion sets were expanded by using two language resources, Shuowen Jiezi and the Four-Corner codes, which improved the coverages of the confusion sets. Nine scoring functions which utilize the frequency data in the Google Ngram Datasets were proposed, where the idea of smoothing was also adopted. Thresholds were also decided in an automatic way. The final system achieved far better than our baseline system in CSC 2013 Evaluation Task.
起訖頁	23-47
關鍵詞	Chinese Spelling Check、Confusion Set Expansion、Google Ngram Scoring Function
刊名	中文計算語言學期刊
期數	201506 (20:1期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	HANSpeller: A Unified Framework for Chinese Spelling Correction
該期刊-下一篇	Automatically Detecting Syntactic Errors in Sentences Written by Learners of Chinese as a Foreign Language