Unsupervised Word Segmentation Without Dictionary

Jason S. Chang; Tracy Lin

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Unsupervised Word Segmentation Without Dictionary
並列篇名	Unsupervised Word Segmentation Without Dictionary
作者	Jason S. Chang (Jason S. Chang)、Tracy Lin (Tracy Lin)
英文摘要	This prototype system demonstrates a novel method of word segmentation based on corpus statistics. Since the central technique we used is unsupervised training based on a large corpus, we refer to this approach as unsupervised word segmentation. The unsupervised approach is general in scope and can be applied to both Mandarin Chinese and Taiwanese. In this prototype, we illustrate its use in word segmentation of Taiwanese Bible written in Hanzi and Romanized characters. Basically, it involves:1.Computing mutual information, MI, between Hanzi and Romanized characters A and B. If A and B have a relatively high MI, we lean toward treating AB as a word. 2.Using a greedy method to form words of 2 to 4 characters in the input sentences. 3.Building an N-gram model from the results of first-round word segmentation.4.Segmenting words based on the N-gram model.5.Iterating between the above two steps: building N-gram and word segmentation.
起訖頁	1-5
刊名	ROCLING論文集
期數	2003 (2003期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	TotalRecall: A Bilingual Concordance in National Digital Learning Project - CANDLE
該期刊-下一篇	盲胞有聲書語音查詢系統

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱