英文摘要 |
Numerous studies relied on CKIP to process Chinese term segmentation as a preprocessing for cluster analysis. Due to its strict limitation of transmission volume and the need of further processing of term filtering and merging, this study adopted a professional corpus composed of subject headings along with a self-developed Chinese Corpus Segmentation (CCS). The results showed that CCS outperforms CKIP in terms of performance and term quality in processing cluster analysis with a high precision rate of 85%. Furthermore, in order to provide high quality news tracking results, we compared SVM, KNN, and Naïve Bayes with regard to the accuracy of classification result. Results showed that SVM was the best among the others, with a high precision rate of 92%. |