英文摘要 |
This research, which used Facebook posts related to the term “retrospective adjustment” in Taiwan as the corpus, manually coded the sentiments of 6,917 posts. Randomly dividing the dataset into two subsets for training (70%) and testing (30%) and using the Chinese pre-trained BERT model as the foundation, we trained and fine-tuned the model with the training dataset and ran the fine-tuned model to predict the sentiments in the test dataset. We then compared the results of the manual coding and model prediction to explain the differences from the perspective of linguistic features. The results indicated that the model performed better for the posts manually coded as “neutral,” with an accuracy of 0.81, while the accuracies of model prediction were only 0.64 and 0.63 for the posts manually coded as “positive” and “negative,” respectively. Regarding inaccuracy, the posts manually coded as “negative” but predicted by the model as “positive” and those manually coded as “positive” but predicted by the model as “neutral” ranked the highest (0.23) and the second highest (0.22), respectively. Examining the linguistic features of the two groups of posts, we identified seven categories of linguistic features that, we claim, led to “negative” coding and four categories that led to “positive” coding. Moreover, both groups contained posts that could not be coded accurately without knowledge of the news and the Facebook account owners’ political/social inclinations, which was attributed to the posts’ high relatedness to the general public and the politics of Taiwan. Considering that the language used in social media is different from the language employed to train current models, and that Facebook users frequently use punctuation marks and emoticons to express their moods, we argue that there is a need to develop a model for social media. |