月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Word Co-occurrence Augmented Topic Model in Short Text
作者 Guan-Bin Chen (Guan-Bin Chen)Hung-Yu Kao (Hung-Yu Kao)
英文摘要
The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. The Bi-term topic model (BTM) has been proposed to improve this problem. However, BTM just consider simple bi-term frequency which cause the generated topics are dominated by common words. In this paper, we solve the problem of the frequent bi-term in BTM. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We apply the word co-occurrence information to the BTM. The experimental result that show our PMI-β-BTM gets well result in the both of regular short news title text and the noisy tweet text. Moreover, there are two advantages in our method. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing BTM based models.
起訖頁 45-63
關鍵詞 Short TextTopic ModelDocument ClusteringDocument Classification
刊名 中文計算語言學期刊  
期數 201512 (20:2期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Explanation Generation for a Math Word Problem Solver
該期刊-下一篇 節錄式語音文件摘要使用表示法學習技術
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄