月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Extension of Zipf's Law to Word and Character N-grams for English and Chinese
作者 Ha, Le Quan (Ha, Le Quan)Sicilia-Garcia,E. I. (Sicilia-Garcia,E. I.)Ming,Ji (Ming,Ji)Smith, F. J. (Smith, F. J.)
中文摘要
It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or characters in one list and put in order of frequency, the frequency of tokens in the combined list follows Zipf’s law approximately with the slope close to -1 on a loglog plot for all n-grams, down to the lowest frequencies in both languages. This behaviour is also found for English 2-byte and 3-byte word fragments. It only happens when all n-grams are used, including semantically incomplete n-grams. Previous theories do not predict this behaviour, possibly because conditional probabilities of tokens have not been properly represented.
起訖頁 77-101
關鍵詞 Zipf's lawChinese characterChinese compound wordn-gramsPhrases
刊名 中文計算語言學期刊  
期數 200302 (8:1期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Measuring and Comparing the Productivity of Mandarin Chinese Suffixes
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄