CrowNER at ROCLING 2023 MultiNER-Health Task: Enhancing NER Task with GPT Paraphrase Augmentation on Sparsely Labeled Data

Yin-Chieh Wang; Wen-Hong Wu; 郭風裕; Han-Chun Wu; Te-Yu Chi; Te-Lun Yang; Sheh Chen; Jyh-Shing Roger Jang

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	CrowNER at ROCLING 2023 MultiNER-Health Task: Enhancing NER Task with GPT Paraphrase Augmentation on Sparsely Labeled Data
並列篇名	CrowNER at ROCLING 2023 MultiNER-Health Task: Enhancing NER Task with GPT Paraphrase Augmentation on Sparsely Labeled Data
作者	Yin-Chieh Wang (Yin-Chieh Wang)、Wen-Hong Wu (Wen-Hong Wu)、郭風裕 (Feng-Yu Kuo)、Han-Chun Wu (Han-Chun Wu)、Te-Yu Chi (Te-Yu Chi)、Te-Lun Yang (Te-Lun Yang)、Sheh Chen (Sheh Chen)、Jyh-Shing Roger Jang (Jyh-Shing Roger Jang)
英文摘要	In this research, we utilized the training dataset from the ROCLING 2023 Chinese Multi-genre Named Entity Recognition in the Healthcare Domain, which comprises the Chinese HealthNER Corpus (Lee and Lu, 2021) and the ROCLING 2022 CHNER Dataset (Lee et al., 2022), along with the test set (Lee et al., 2023). The objective was to address the named entity recognition task within the Chinese healthcare domain. Our initial step involved preprocessing the training dataset. We identified instances in the training set where sentences with identical structural patterns exhibited ambiguities and errors in named entity definitions. Prioritizing data validation, we manually excluded erroneous entries. In specialized domains such as medicine, domain-specific terminologies and proprietary names are often defined within sentences as merged labels, rather than separate ones. Thus, we employed the’Entity Relationship Construction and Merging Strategies’approach to consolidate related named entities. Subsequently, we computed the frequencies of sentence and entity occurrences. We extracted sparsely labeled data and applied two techniques for data augmentation: GPT Paraphrase and entity replacement while preserving sentence structure. These steps resulted in an augmented training set. Finally, we conducted fine-tuning experiments on various state-of-the-art BERT-based models to obtain a model suitable for the ROCLING Shared Task.
起訖頁	338-348
關鍵詞	GPT 3.5、Data augmentation、GPT paraphrase、Entity Relationship Construction and Merging Strategies
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Overview of the ROCLING 2023 Shared Task for Chinese Multi-genre Named Entity Recognition in the Healthcare Domain