中文摘要 |
資料類別不平衡存在長尾標籤問題,單獨的多標籤分類模型一次預測所有類別,針對個別標籤的最佳化十分困難,對於出現次數較少的長尾標籤效能通常不佳。本論文提出一種響應式知識蒸餾機制,將多個最佳化的二元模型作為教師網路,單一多標籤模型做為學生網路,改善多標籤模型在非平衡標籤的資料集分類效能。實驗資料來自2,724個中文健康照護文本,人工標記文章內容橫跨9個類別,總共標籤數量是8,731,平均每個樣本有3.2個標籤。實驗設定採用5折交互驗證,比較TextRNN、TextCNN、HAN和GRU-att模型,使用知識蒸餾機制與否的效能差異,結果顯示透過知識蒸餾機制能夠顯著提升單一多標籤分類模型的micro-F1約2至3 %、macro-F1約4至6 %、weighted-F1約3至4 %,以及subset accuracy約1至2 %。 |
英文摘要 |
It’s difficult to optimize individual label performance of multi-label text classification, especially in those imbalanced data containing long-tailed labels. Therefore, this study proposes a response-based knowledge distillation mechanism comprising a teacher model that optimizes binary classifiers of the corresponding labels and a student model that is a standalone multi-label classifier learning from distilled knowledge passed by the teacher model. A total of 2,724 Chinese healthcare texts were collected and manually annotated across nine defined labels, resulting in 8731 labels, each containing an average of 3.2 labels. We used 5-fold cross-validation to compare the performance of several multi-label models, including TextRNN, TextCNN, HAN, and GRU-att. Experimental results indicate that using the proposed knowledge distillation mechanism effectively improved the performance no matter which model was used, about 2-3% of micro-F1, 4-6% of macro-F1, 3-4% of weighted-F1 and 1-2% of subset accuracy for performance enhancement. |