部落格本文自動萃取機制

洪智力; 林政輝

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	部落格本文自動萃取機制
並列篇名	An Automatic Blog Text Extraction Mechanism
作者	洪智力、林政輝
中文摘要	在部落格快速發展的時代，部落格上的資訊越來越多且具有參考價值，部落格文字內容探勘已成為網頁探勘研究的重要分支。要能自動化讀取部落格的文字內容，必須正確的找出描述本文的網頁標籤。本研究提出「網頁標籤文字相對比例法」，找出最有可能的本文標籤，此技術運用文件物件模型（DOM; document object model）的概念並透過網頁爬行器自動萃取部落格本文。經過實驗說明，本研究所提供的部落格本文自動萃取機制，能正確的過濾雜訊，找出本文標籤。
英文摘要	In the era of blog, more and more useful information is shared on blogs. Mining text on blogs has become one of important and novel research directions in the filed of web mining. For an automatic blog text mining system, it is necessary to locate the tags which describe the main concepts of blog text effectively and efficiently. This research uses the technique of relative proportion of text and tag in order to find the most possible tag for main blog text. More particularly, we use the concept of DOM (document object model) through the java crawler to analyze the relationship between text and tag. According to our experiments, our automatic blog text extraction mechanism is able to extract the main text of blog effectively and efficiently.
起訖頁	457-472
關鍵詞	部落格文章、資訊擷取、文字探勘、文件物件模型、blog text、information extraction、text mining、document object model
刊名	電子商務研究
期數	201012 (8:4期)
出版單位	國立臺北大學資訊管理研究所
該期刊-上一篇	具效率與可延遲驗證之一次性信用卡號付款機制
該期刊-下一篇	電玩遊戲內置入Bloom知識與認知歷程重現概念模型