  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。

The Use of Clustering Techniques for Language Modeling--Application to Asian Language
作者 Gao, Jianfeng (Gao, Jianfeng)Goodman, Joshua T. (Goodman, Joshua T.)Miao, Jiangbo (Miao, Jiangbo)
Cluster-based n-gram modeling is a variant of normal word-based n-gram modeling. It attempts to make use of the similarities between words. In this paper, we present an empirical study of clustering techniques for Asian language modeling. Clustering is used to improve the performance (i.e. perplexity) of language models as well as to compress language models. Experimental tests are presented for cluster-based trigram models on a Japanese newspaper corpus and on a Chinese heterogeneous corpus. While the majority of previous research on word clustering has focused on how to get the best clusters, we have concentrated our research on the best way to use the clusters. Experimental results show that some novel techniques we present work much better than previous methods, and achieve more than 40% size reduction at the same level of perplexity.
起訖頁 27-60
刊名 中文計算語言學期刊  
期數 200102 (6:1期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora
該期刊-下一篇 Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin Texts




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄