英文摘要 |
Text mining is composed by data mining and a little basic linguistics. Techniques in text mining are also related with term frequency and the number of documents. Both of this information is few to be used in text mining. In the studies of document clustering had already development many different kinds of clustering algorithms. The most often to be used in non hierarchical clustering is k-means, but the k value is selected by random. Therefore, it is easy to make a bad effect by outlier of data. In order to improve disadvantage of k-means algorithm, we proposed using the way of hierarchical clustering. First, we used experiment data to make it clustering. Then, we found fitness for a number of cluster and initial value to enhance effectiveness and the speed of convergence. Besides, we not only used relativity way to compared and filtering unnecessary keywords, but also used hierarchical clustering to control the quality that made it have good performance on the precision. |