  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

An Estimation of the Entropy of Chinese - A New Approach to Constructing Class-based n-gram Models
作者 Jyun-Sheng Chang (Jyun-Sheng Chang)Yuh-Juh Lin (Yuh-Juh Lin)
This paper describes a new approach to constructing a class-based language model and reports an estimation of the upper bound of the entropy of Chinese using the model. A class-based n-gram model built on an existing machine readable thesaurus is shown to lower cross entropy between the language model and a balanced corpus of 300,000 words. The cross-entropy of the corpus and the proposed language model is 12.66 bits per word or 3.88 bits per byte, which is better than another class-based language model the inter-word character bigram model by 0.6 bit per word. In the process of estimating the entropy, we found that unknown words take up disproportionately large amount of entropy and are the major bottleneck for obtaining lower entropy or better language models for tasks such as OCR and speech recognition.
起訖頁 149-169
刊名 ROCLING論文集  
期數 1994 (1994期)
出版單位 國立高雄師範大學輔導與諮商研究所
該期刊-上一篇 從剖析觀點分析「所+動詞」結構
該期刊-下一篇 Some Issues on Applying SA-class Bigram Language Models




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄