生成模型是否能用於偵測身體羞辱仇恨言論?

蔡元翔; 張瑜芸

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	生成模型是否能用於偵測身體羞辱仇恨言論?
並列篇名	Can generative models be used to detect hate speech related to body shaming?
作者	蔡元翔、張瑜芸
中文摘要	本研究實驗Encoder和Decoder兩者架構下的預訓練語言模型是否能很好的判斷身體羞辱的仇恨言論。先前研究多針對大型語言模型中的生成模型是否會生成歧視言語進行討論和防範，但尚未有研究進一步判別生成模型是否可應用於自動分類判斷歧視言論。因而本研究採用零樣本分類方式並提供完整身體羞辱定義，觀察以Decoder架構為主的生成模型(ChatGLM-6B和Chinese-Alpaca-Plus-7B模型)是否適用於自動判別歧視言論。此外，為了更完整的了解大型語言模型不同架構下對於仇恨言論判斷結果，也採用以Encoder架構為主的BERT模型進行分類判斷，並將兩架構下的結果做進一步分析比對。最終結果顯示BERT經過少量微調資料下就能獲得相對好的性能，生成模型在零樣本分類上確實是有些困難，需要進一步改善提示模板工程，提供多種句型結構的句子詳加解釋後，在少樣本分類上觀察生成模型表現是否能進一步提升。
英文摘要	This study experiments with both Encoder and Decoder architectures of pre-trained language models to determine their effectiveness in identifying hate speech related to body shaming. Previous research has largely focused on discussing and mitigating the automatic generation of discriminatory language in generative models within LLMs. However, there hasn’t been research investigating the further application of generative models in automatically classifying and identifying discriminatory language. Therefore, this study employs a zero-shot classification approach and provides a comprehensive definition of body shaming to examine whether Decoder-focused generative models (ChatGLM-6B and Chinese-Alpaca-Plus-7B) are suitable for automatically identifying discriminatory language. Furthermore, to gain a more comprehensive understanding of how different architectures within LLMs perform in hate speech detection, a BERT model with an Encoder architecture is also employed for classification. The results from both architectures are then further analyzed and compared. BERT shows good performance with minimal finetuning data, while generative models struggle with zero-shot classification. Thus we aim to explore the potential for improving the performance of generative models by providing detailed explanations for sentences with various structures.
起訖頁	270-278
關鍵詞	零樣本分類、提示模板工程、仇恨言論檢測、身體羞辱、zero-shot classification、prompt engineering、detection of hate speech、body shaming
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Solving Linguistic Olympiad Problems with Tree-of-Thought Prompting
該期刊-下一篇	Lexical Complexity Prediction using Word Embeddings