  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Hierarchical Web Document Classification Based on Hierarchically Trained Domain Specific Words
Hierarchical Web Document Classification Based on Hierarchically Trained Domain Specific Words
作者 Jing-Shin Chang
Search engines return thousands of pages per query. Many of them are relevant to the“query words”but not interesting to the“users”due to different domain-specific meanings of the query terms. Re-classification of the returned documents based on domain specific meanings of the query terms would therefore be most effective. A cross domain entropy (CDE) measure is proposed to extract characteristic domain specific words (DSW's) for each node of existing hierarchical web document trees. Domain specific class models are built based on the respective DSW's. Such class models are then used for directly classifying new documents into the hierarchy, instead of using hierarchical clustering techniques. High accuracy can be achieved with very few domain specific words. With only the top 5~10% DSW's and a maximum entropy based classifier, 99% accuracy is observed when classifying documents of a news web site into 63 domains. The precision and recall of the extracted domain specific words are also higher than those extracted with conventional TF-IDF term weighting method.
起訖頁 345-357
關鍵詞 Domain Specific WordsHierarchical ClassificationMaximum Entropy ClassifierCross-Domain Entropy
刊名 ROCLING論文集  
期數 2009 (2009期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 基於盲訊號分離語音增強技術之遠距離雜訊語音辨識
該期刊-下一篇 中文混淆字集應用於別字偵錯模板自動產生




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄