英文摘要 |
In this study, named entity recognition is constructed and applied in the medical domain. Data is labeled in BIO format. For example, ''muscle'' would be labeled ''B-BODY'' and ''I-BODY'', and ''cough'' would be ''B-SYMP'' and ''I-SYMP''. All words outside the category are marked with ''O''. The Chinese HealthNER Corpus contains 30,692 sentences, of which 2531 sentences are divided into the validation set (dev) for this evaluation, and the conference finally provides another 3204 sentences for the test set (test). We use BLSTM_CRF, Roberta+BLSTM_CRF and BERT Classifier to submit three prediction results respectively. Finally, the BERT Classifier system submitted as RUN3 achieved the best prediction performance, with an accuracy of 80.18%, a recall rate of 78.3%, and an F1-score of 79.23. |