結合依存句法分析及圖神經網路的文本分類方法

周冠勳; 吳炎蒼; 王正豪

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	結合依存句法分析及圖神經網路的文本分類方法
並列篇名	Combining Dependency Parser and GNN models for Text Classification
作者	周冠勳、吳炎蒼、王正豪
中文摘要	隨著數據量的擴增，人為地對文本進行分類是相當耗成本的，因此自動化的文本分類變得十分重要，如垃圾郵件偵測、新聞分類、情緒分析等。目前自然語言方面的深度學習模型大致上分為兩類：sequential和graph based，sequential模型通常都是使用RNN和CNN，以及近年來在各方面都很突出的BERT模型及其變種；近年來有許多的研究，開始將graph based的深度模型應用在NLP上，利用文字之間的co-occurrence關係，從而學習到文字和文本的特徵，以進行分類。本論文首先使用RNN計算出文本中的文字特徵，將所有文字當作node，並用文字之間的修飾關係建graph，使用graph model重新得到文字特徵，並預測文本類別。在實驗中，我們使用多種資料集，MR、R8、R52和Ohsumed作為驗證。與多種模型，TF-IDF+LR、CNN、LSTM、PV-DBOW、PV-DM、PTE、fastText、SWEM、LEAM和Text GCN進行比較。在MR上獲得較好的結果（Accuracy: 79.42%）。
英文摘要	As the amount of data increases, manually classifying texts is expensive. Therefore, automated text classification has become important, such as spam detection, news classification, and sentiment analysis. Recently, deep learning models in natural language are roughly divided into two categories: sequential and graph based. The sequential models usually use RNN and CNN, as well as the BERT model and its variants; In recent years, researchers started to apply the graph based deep learning model to NLP, using word co-occurrence and TF-IDF weights to build graphs in order to learn the features of words and documents for classification. In the experiment, we use different datasets, MR, R8, R52 and Ohsumed for verification. Comparing with sequential and graph-based models, the accuracy of our proposed method on MR can achieve 0.79.
起訖頁	1-9
關鍵詞	依存句法分析、圖神經網路、文本分類、dependency parser、graph neural network、text classification
刊名	ROCLING論文集
期數	2020 (2020期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	使用元學習技術於語碼轉換語音辨識之初步研究
該期刊-下一篇	運用集成式多通道類神經網路於科技英文寫作評估