  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Term Selection with Distributional Clustering for Chinese Text Categorization using N-grams
作者 Jyh-Jong Tsay (Jyh-Jong Tsay)Jing-Doo Wang (Jing-Doo Wang)
In this paper we propose an SB-tree approach to extract significant patterns efficiently by scanning the leaves of the SB-tree to decide the boundary of significant patterns for term extraction, and reduce the dimension of term space to an practical level by a combination of term selection and term clustering. Our current experiment uses CNA one year news as training data, which consists of 73,420 articles and is far more than previous related research. In the experiment, we compare the performance four term selection methods, odds ratio, mutual information, information gain and X2 statistic, when they are combined with distributional clustering method. Our experiment shows that x2 statistic and information gain achieve performance better than odd ratio and mutual information when they are combined with distributional clustering. With the combination of term selection and term clustering, the dimension of term space can be greatly reduced from 60000 to 120 while maintaining similar classification accuracy.
起訖頁 151-170
關鍵詞 Text CategorizationTerm SelectionTerm ClusteringNaive Bayes ClassifierInformation Retrieval
刊名 ROCLING論文集  
期數 1999 (1999期)
出版單位 國立高雄師範大學輔導與諮商研究所
該期刊-上一篇 階層式文件自動分類之特徵選取研究
該期刊-下一篇 Automatically Controlled-Vocabulary Indexing for Text Retrieval




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄