基於RoBERTa的中藥命名實體識別模型

Ming-Hsiang Su; Chin-Wei Lee; Chi-Lun Hsu; Ruei-Cyuan Su

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	基於RoBERTa的中藥命名實體識別模型
並列篇名	RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model
作者	Ming-Hsiang Su (Ming-Hsiang Su)、Chin-Wei Lee (Chin-Wei Lee)、Chi-Lun Hsu (Chi-Lun Hsu)、Ruei-Cyuan Su (Ruei-Cyuan Su)
中文摘要	本研究構建了一個命名實體識別，並將其應用於中藥名稱和疾病名稱的識別，其結果可進一步用於人機對話系統，為人們提供正確的中藥用藥提醒。首先，本研究利用網路爬蟲整理網路資源，成為中藥命名實體語料庫，收集了1097篇文章，1412個疾病名稱和38714個中藥名稱。然後，我們使用中藥名稱和BIO標籤方法對每篇文章進行標註。最後，本研究用BiLSTM和CRF對BERT、ALBERT、RoBERTa、GPT2進行訓練和評估。實驗結果表明，RoBERTa結合BiLSTM和CRF的NER系統取得了最好的系統性能，其中精準率為0.96，召回率為0.96，F1-score為0.96。
英文摘要	In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa's NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.
起訖頁	61-66
關鍵詞	中藥、疾病、命名實體識別模型、Traditional Chinese Medicine、Disease、Named Entity Recognition Model
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?
該期刊-下一篇	運用不同音訊長度於遷移式學習以提升電鋸聲音識別能力之研究