  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Unsupervised Word Segmentation Without Dictionary
Unsupervised Word Segmentation Without Dictionary
作者 Jason S. ChangTracy Lin (Tracy Lin)
This prototype system demonstrates a novel method of word segmentation based on corpus statistics. Since the central technique we used is unsupervised training based on a large corpus, we refer to this approach as unsupervised word segmentation. The unsupervised approach is general in scope and can be applied to both Mandarin Chinese and Taiwanese. In this prototype, we illustrate its use in word segmentation of Taiwanese Bible written in Hanzi and Romanized characters. Basically, it involves:1.Computing mutual information, MI, between Hanzi and Romanized characters A and B. If A and B have a relatively high MI, we lean toward treating AB as a word. 2.Using a greedy method to form words of 2 to 4 characters in the input sentences. 3.Building an N-gram model from the results of first-round word segmentation.4.Segmenting words based on the N-gram model.5.Iterating between the above two steps: building N-gram and word segmentation.
起訖頁 1-5
刊名 ROCLING論文集  
期數 2003 (2003期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 TotalRecall: A Bilingual Concordance in National Digital Learning Project - CANDLE
該期刊-下一篇 盲胞有聲書語音查詢系統




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄