混合型資料集的K-means分群演算法

黃宇翔; 王品鈞; 方志強

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	混合型資料集的K-means分群演算法
並列篇名	A k-means Based Clustering Algorithm for Mixed-Attribute Data Sets
作者	黃宇翔、王品鈞、方志強
中文摘要	叢集分析為資料探勘分群技術之一，由於目前網路環境快速發展，資料屬性的種類與數量大量增加，導致傳統分群技術執行的效能大幅降低，傳統k-means分群方法將難以應付。因此後續的相關研究則是針對數值、類別、順序等屬性資料的處理作為研究的重點。本研究以Ahmad and Dey（2007）所提出k-means之衡量距離定義為基礎，針對三種屬性同時存在的資料集做叢集分析，並以各自不同的衡量距離定義作為分群考量，提出基因演算法以求得最佳衡量指標最好之群心組合，希望能提供各界應用，解決因三種混合的資料屬性所造成分群困難的實務問題。
英文摘要	Clustering is one of the most important analysis methods in data mining. In the wake of the fast development of networks technology, various types of data attribute and large numbers of data items cause the substantial inefficiency of data processing for clustering. Among different clustering approaches, partitioning clustering is relatively easier to implement and faster to perform than other ones. Different types of data attributes make clustering complicated. Most of literature focuses on numerical and categorical attributes or only ordinal attributes, respectively, but the results turn out to be less satisfactory in terms of accuracy and execution time. The proposed clustering approach, based on Ahmad and Dey (2007) k-means method, is advantageous in dealing with the three attributes: numerical, categorical and ordinal attributes simultaneously in which Euclidean distance is used to define the numerical similarity, the frequency of each value’s rank is used to indicate the categorical similarity, and the normalized distance is used to measure the ordinal similarity. The effectiveness of the proposed approach is evaluated by the use of an essential concept of clustering which is to minimize the ratio of the within cluster errors to the between cluster errors. A generic algorithm is also developed for reducing the execution time in dealing with the clustering of the three types of attributes at the same time. We hope the proposed method can provide a useful clustering technique for applications in practice.
起訖頁	1-28
關鍵詞	叢集分析、k-means、順序屬性、距離量度、Clustering analysis、k-means、ordinal attribute、distance measure
刊名	電子商務學報
期數	201706 (19:1期)
出版單位	中華企業資源規劃學會
該期刊-下一篇	新型態之電子投票機制設計

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱