月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Chinese Word Segmentation by Classification of Characters
作者 Goh, Chooi-ling (Goh, Chooi-ling)Asahara, Masayuki (Asahara, Masayuki)Matsumoto, Yuji (Matsumoto, Yuji)
中文摘要
During the process of Chinese word segmentation, two main problems occur: segmentation ambiguities and unknown word occurrences. This paper describes a method to solve the segmentation problem. First, we use a dictionary-based approach to segment the text. We apply the Maximum Matching algorithm to segment the text forwards (FMM) and backwards (BMM). Based on the difference between FMM and BMM, and the context, we apply a classification method based on Support Vector Machines to re-assign the word boundaries. In so doing, we use the output of a dictionary-based approach, and then apply a machine-learning-based approach to solve the segmentation problem. Experimental results show that our model can achieve an F-measure of 99.0 for overall segmentation, given the condition that there are no unknown words in the text, and an F-measure of 95.1 if unknown words exist.
起訖頁 381-396
關鍵詞 ChineseWord segmentationSegmentation ambiguityUnknown wordMaximum matching algorithmSupport vector machines
刊名 中文計算語言學期刊  
期數 200509 (10:3期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition
該期刊-下一篇 The Design and Construction of the PolyU Shallow Treebanck
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄