Enhancing Chinese Multi-Label Text Classification Performance with Response-based Knowledge Distillation
作者 Szu-Chi Huang (Szu-Chi Huang)Cheng-Fu Cao (Cheng-Fu Cao)Po-Hsun Liao (Po-Hsun Liao)Lung-Hao Lee (Lung-Hao Lee)Po-Lei Lee (Po-Lei Lee)Kuo-Kai Shyu (Kuo-Kai Shyu)
資料類別不平衡存在長尾標籤問題,單獨的多標籤分類模型一次預測所有類別,針對個別標籤的最佳化十分困難,對於出現次數較少的長尾標籤效能通常不佳。本論文提出一種響應式知識蒸餾機制,將多個最佳化的二元模型作為教師網路,單一多標籤模型做為學生網路,改善多標籤模型在非平衡標籤的資料集分類效能。實驗資料來自2,724個中文健康照護文本,人工標記文章內容橫跨9個類別,總共標籤數量是8,731,平均每個樣本有3.2個標籤。實驗設定採用5折交互驗證,比較TextRNN、TextCNN、HAN和GRU-att模型,使用知識蒸餾機制與否的效能差異,結果顯示透過知識蒸餾機制能夠顯著提升單一多標籤分類模型的micro-F1約2至3 %、macro-F1約4至6 %、weighted-F1約3至4 %,以及subset accuracy約1至2 %。
It’s difficult to optimize individual label performance of multi-label text classification, especially in those imbalanced data containing long-tailed labels. Therefore, this study proposes a response-based knowledge distillation mechanism comprising a teacher model that optimizes binary classifiers of the corresponding labels and a student model that is a standalone multi-label classifier learning from distilled knowledge passed by the teacher model. A total of 2,724 Chinese healthcare texts were collected and manually annotated across nine defined labels, resulting in 8731 labels, each containing an average of 3.2 labels. We used 5-fold cross-validation to compare the performance of several multi-label models, including TextRNN, TextCNN, HAN, and GRU-att. Experimental results indicate that using the proposed knowledge distillation mechanism effectively improved the performance no matter which model was used, about 2-3% of micro-F1, 4-6% of macro-F1, 3-4% of weighted-F1 and 1-2% of subset accuracy for performance enhancement.
起訖頁 25-31
關鍵詞 多標籤分類長尾標籤二元相關知識蒸餾Multi-label classificationlongtailed labelsbinary relevanceknowledge distillation
刊名 ROCLING論文集  
期數 202212 (2022期)
出版單位 中華民國計算語言學學會
