Bayesian Topic Mixture Model for Information Retrieval
作者 吳孟淞許軒睿簡仁宗
In studies of automatic text processing, it is popular to apply the probabilistic topic model to infer word correlation through latent topic variables. Probabilistic latent semantic analysis (PLSA) is corresponding to such model that each word in a document is seen as a sample from a mixture model where mixture components are modeled by multinomial distribution. Although PLSA model deals with the issue of multiple topics, each topic model is quite simple and the word burstiness phenomenon is not taken into account. In this study, we present a new Bayesian topic mixture model (BTMM) to overcome the burstiness problem inherent in multinomial distribution. Accordingly, we use the Dirichlet distribution for representation of topic information beyond document level. Conceptually, the documents in the same class are generated by the associated multinomial distribution. In the experiments on TREC text corpus, we show the results of average precision and model perplexity to demonstrate the superiority of using proposed BTMM method.
起訖頁 1-15
關鍵詞 貝氏機率模型圖形模型機率潛在語意模型Dirichlet 事前機率資訊檢索Bayesian modelGraphical modelPLSADirichlet PriorInformation Retrieval
刊名 ROCLING論文集  
期數 2007 (2007期)
出版單位 中華民國計算語言學學會
