月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Reduced N-Grams for Chinese Evaluation
作者 Ha, Le Quan (Ha, Le Quan)Seymour, R. (Seymour, R.)Hanna, P. (Hanna, P.)Smith, F. J. (Smith, F. J.)
中文摘要
Theoretically, an improvement in a language model occurs as the size of the n-grams increases from 3 to 5 or higher. As the n-gram size increases, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. To avoid these problems, the reduced n-grams’ approach previously developed by O’ Boyle and Smith [1993] can be applied. A reduced n-gram language model, called a reduced model, can efficiently store an entire corpus’s phrase-history length within feasible storage limits. Another advantage of reduced n-grams is that they usually are semantically complete. In our experiments, the reduced n-gram creation method or the O’ Boyle-Smith reduced n-gram algorithm was applied to a large Chinese corpus. The Chinese reduced n-gram Zipf curves are presented here and compared with previously obtained conventional Chinese n-grams. The Chinese reduced model reduced perplexity by 8.74% and the language model size by a factor of 11.49. This paper is the first attempt to model Chinese reduced n-grams, and may provide important insights for Chinese linguistic research.
起訖頁 19-34
關鍵詞 Reduced n-gramsReduced n-gram algorithmReduced n-gram identificationReduced modelChinese reduced n-gramsChinese reduced model
刊名 中文計算語言學期刊  
期數 200503 (10:1期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription
該期刊-下一篇 Automated Alignment and Extraction of a Bilingual Ontology for Cross-Language Domain-Specific Applications
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄