貝氏主題混合資訊檢索模型

吳孟淞; 許軒睿; 簡仁宗

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	貝氏主題混合資訊檢索模型
並列篇名	Bayesian Topic Mixture Model for Information Retrieval
作者	吳孟淞、許軒睿、簡仁宗
中文摘要	在自動文件處理之相關研究中，我們常利用機率主題模型從字詞相互關係推斷並建立潛在主題變數。在機率潛在語意模型（PSLA）裡，文件中的每一個字詞在混合模型即視為一個樣本，其混合成分是使用多項分佈來表示的。然而，多項分佈方式沒有考慮到文集中發生的突發現象。雖然PLSA模型可以顯示多重主題樣式，但是每個主題模型都十分簡單。在本研究中，我們提出一種新型之貝氏主題混合模型來解決多項分布固有的一些問題。使用Dirichlet分佈表示每一個主題的條件機率分佈，在相同種類內的不同的文件經由不同的多項分布來產生。在TREC文件集之資訊檢索實驗上，利用文件檢索及文件模組化之評估來驗證貝氏主題模型的優越性。
英文摘要	In studies of automatic text processing, it is popular to apply the probabilistic topic model to infer word correlation through latent topic variables. Probabilistic latent semantic analysis (PLSA) is corresponding to such model that each word in a document is seen as a sample from a mixture model where mixture components are modeled by multinomial distribution. Although PLSA model deals with the issue of multiple topics, each topic model is quite simple and the word burstiness phenomenon is not taken into account. In this study, we present a new Bayesian topic mixture model (BTMM) to overcome the burstiness problem inherent in multinomial distribution. Accordingly, we use the Dirichlet distribution for representation of topic information beyond document level. Conceptually, the documents in the same class are generated by the associated multinomial distribution. In the experiments on TREC text corpus, we show the results of average precision and model perplexity to demonstrate the superiority of using proposed BTMM method.
起訖頁	1-15
關鍵詞	貝氏機率模型、圖形模型、機率潛在語意模型、Dirichlet 事前機率、資訊檢索、Bayesian model、Graphical model、PLSA、Dirichlet Prior、Information Retrieval
刊名	ROCLING論文集
期數	2007 (2007期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	以英語寫作輔助為目的之語料庫語句檢索方法
該期刊-下一篇	Korean-Chinese Cross-Language Information Retrieval Based on Extension of Dictionaries and Transliteration