中文摘要 |
In current society, people can easily use social media to express their own opinions toward products and services. These online comments can influence other customers’ purchase behaviors. Especially those negative reviews and comments can hurt the images of companies. Consequently, to identify the sentiment of social media users from a large amount comments is one of crucial issues. In recent years, machine learning approaches have been considered as one of possible solutions for recognizing sentiment of text reviews. But, when using these methods to sentiment classification, traditional term weighting methods including Term Presence (TP), Term Frequency (TF), and Term Frequency-Inverse Document Frequency (TF-IDF) often have been utilized for describing the collected textual reviews. However, those conventional term weighting methods cannot have positive effect on improving the classification performance of text sentiment data. Therefore, this study aims to propose two new term weighting methods called Categorical Difference Weights (CDW) and TF-CDW by integrating class information into term weights of textual data to construct Term-Document Matrix (TDM). Then, Support Vector Machines (SVM) will be employed to build classifiers. Finally, we will use several actual cases to demonstrate the effectiveness of our presented methods. Compared to traditional term weighting methods, results showed that our methods indeed outperform TF, TP and TF-IDF. |