以社群媒體語言建構深度學習模型：以「校正回歸」為例

段人鳯; 邱淑怡; 劉慧雯

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	以社群媒體語言建構深度學習模型：以「校正回歸」為例
並列篇名	Constructing a Deep Learning Model Using Language in Social Media: The Case Study of‘Retrospective Adjustment’
作者	段人鳯、邱淑怡、劉慧雯
中文摘要	本研究以台灣新冠肺炎期間首度出現的「校正回歸」一詞相關的臉書貼文為語料，進行人工情感分析與模型預測。我們對6,917筆語料進行人工標記，並將這些標記完成的語料分成70％和30％，以BERT-Chinese之預訓練模型(pre-trained model)利用70%的語料進行微調機制(fine tune)，再以微調後的模型預測剩餘的30%語料，並加以比對人工標記和模型預測的結果，試圖從語言特徵找出兩者間差異的可能原因。研究結果顯示，在人工標註為中立的貼文中，模型有較好的預測能力，正確率達0.81；而人工標註為正向和負向的貼文中，模型的預測能力較差，分別為0.64和0.63。進一步觀察人工標註和模型預測的差異，人工標註為負向而模型預測為正向的有0.23，乃所有錯誤之最，其次為人工標註為正向而模型預測為中立的貼文，0.22。我們逐筆檢視這兩大類貼文，歸納出7類負向情感的語言特徵及4類正向情感的語言特徵。在檢視語言特徵時，研究者亦發現，由於本文所搜集之語料具有高度的公共性與政治性，僅討論貼文內容有時不易判斷意義，還需考慮貼文者身份，此亦可能影響了機器預測的正確率。我們主張，社群媒體的語言有別於當下模型訓練使用的資料集，且貼文者常常使用表情符號或標點符號來表達情感，未來我們將發展適合台灣的社群媒體語意的預測模型，以期提升模型預測的正確率。
英文摘要	This research, which used Facebook posts related to the term “retrospective adjustment” in Taiwan as the corpus, manually coded the sentiments of 6,917 posts. Randomly dividing the dataset into two subsets for training (70%) and testing (30%) and using the Chinese pre-trained BERT model as the foundation, we trained and fine-tuned the model with the training dataset and ran the fine-tuned model to predict the sentiments in the test dataset. We then compared the results of the manual coding and model prediction to explain the differences from the perspective of linguistic features. The results indicated that the model performed better for the posts manually coded as “neutral,” with an accuracy of 0.81, while the accuracies of model prediction were only 0.64 and 0.63 for the posts manually coded as “positive” and “negative,” respectively. Regarding inaccuracy, the posts manually coded as “negative” but predicted by the model as “positive” and those manually coded as “positive” but predicted by the model as “neutral” ranked the highest (0.23) and the second highest (0.22), respectively. Examining the linguistic features of the two groups of posts, we identified seven categories of linguistic features that, we claim, led to “negative” coding and four categories that led to “positive” coding. Moreover, both groups contained posts that could not be coded accurately without knowledge of the news and the Facebook account owners’ political/social inclinations, which was attributed to the posts’ high relatedness to the general public and the politics of Taiwan. Considering that the language used in social media is different from the language employed to train current models, and that Facebook users frequently use punctuation marks and emoticons to express their moods, we argue that there is a need to develop a model for social media.
起訖頁	153-179
關鍵詞	社群媒體、深度學習、校正回歸、情感分析、自然語言處理、Social Media、Deep Learning、Retrospective Adjustment、Sentiment Analysis、Natural Language Processing
刊名	中文計算語言學期刊
期數	202206 (27:1期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Let Me Finish!--Speech Patterns of Interruptions in Chinese: A Corpus-based Study on Parliamentary Interpellations on Taiwan