以混合式技術改善文件聚類之精確度

王台平; 古祐嘉

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	以混合式技術改善文件聚類之精確度
並列篇名	A Study on the Precision of Document Clustering by the Mix Type Technology
作者	王台平、古祐嘉
中文摘要	文件探勘是資料探勘加上一些基礎的語言學所構成的。文件探勘運用的技術，幾乎都與詞彙的頻率與出現篇數有關，但這兩項資訊在資料探勘中卻極少用到。目前在文件聚類的研究中，已經發展出許多不同的聚類演算法，不同的聚類方式對於聚類的成效也有所不同，其中較常被使用的是K-means非階層式聚類演算法，但是K-means聚類在K值的選取上卻是隨機的，因此容易受到資料的離群值所影響，導致聚類的成效不佳。本研究中，吾人提出以階層式聚類的方式，將實驗資料進行聚類，找出合適的群集數與初始值，改善非階層式聚類K-means演算法的缺點，使聚類的成效能夠有所提升，並加速K-means演算法收斂的速度。而本研究也將採用相對比較的方式，過濾不必要的特徵詞彙，及使用階層式聚類法來控制聚類的品質，使得文件聚類的精確度能夠有良好的表現。
英文摘要	Text mining is composed by data mining and a little basic linguistics. Techniques in text mining are also related with term frequency and the number of documents. Both of this information is few to be used in text mining. In the studies of document clustering had already development many different kinds of clustering algorithms. The most often to be used in non hierarchical clustering is k-means, but the k value is selected by random. Therefore, it is easy to make a bad effect by outlier of data. In order to improve disadvantage of k-means algorithm, we proposed using the way of hierarchical clustering. First, we used experiment data to make it clustering. Then, we found fitness for a number of cluster and initial value to enhance effectiveness and the speed of convergence. Besides, we not only used relativity way to compared and filtering unnecessary keywords, but also used hierarchical clustering to control the quality that made it have good performance on the precision.
起訖頁	847-885
關鍵詞	文件探勘、文件聚類、雜訊過濾、品質控制、Text mining、Document clustering、Noise filter、Quality control
刊名	電子商務學報
期數	200712 (9:4期)
出版單位	中華企業資源規劃學會
該期刊-上一篇	挖掘關聯規則之階段搜尋演算法－－GSA
該期刊-下一篇	電信新產品服務知覺品質、服務等候、企業形象與知覺價值對消費者行為意向之影響

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱