中文摘要 |
傳統的立場分析常常使用問卷調查、電話訪查等來得知不同的主題下每個人的觀點。但由於傳統的統計方法,採用抽樣的方式,容易因為樣本數的不足,導致效果較差。現有的方法包括以情緒字典、以卷積式類神經網路(CNN)、遞歸式類神經網路(RNN)等,但是因為深度類神經網路需要較多資料集才能提升效果。而文本的特徵則採用N-Gram或是TF-IDF方法,但這樣無法真正了解文本的語意。本論文提出利用Word2Vec字詞表示模型,來取得字詞的向量,並結合LDA方法來取得文本的特徵。在立場檢測方面,我們以SVM作為分類器,以兩階段方法分辨人們是否中立與否的主觀性問題,並預測使用者的立場。本論文以SemEval-2016的立場偵測任務,作為實驗的資料來源,並使用多種方法(F-Measure, Accuracy, Precision, Recall)來評估效果,相較於 SemEval-2016的基線或其他隊伍分數,平均而言,本論文所提的方法皆獲得較好的結果(F-Measure:83.36%)。 |
英文摘要 |
In traditional stance analysis, questionnaire survey or telephone survey are often used to know the opinions of each person under different topics. However, due to the traditional statistical methods, the sample size is too small to get good result. Existing methods are usually based on sentiment lexicon, Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN). And the text features are based on N-Gram or TF-IDF, which do not help to understand the semantics of the text. This research proposes to use Word2Vec for word embedding and combine the LDA to obtain the text feature. For stance detection, we use Support Vector Machine (SVM) to train the classifier to detect the subjectivity of texts, and to predict user stances.In the experiment, we used the data from SemEval-2016 Stance Detection Task, and use a variety of evaluation methods (F-Measure, Accuracy, Precision, Recall) to evaluate performance. Compared with SemEval-2016 official baseline and other teams scores, our proposed method can get better result on average (F-Measure : 83.36%). |