中文摘要 |
台灣衛生福利部中央健康保險署為因應國際趨勢於2016年由ICD-9 CM改為ICD-10-CM/PCS以作為醫院與診所醫療人員對患者疾病診斷及手術處置之歸納統計、分析與研究及健保費用給付等用途之編碼標準。然而改版後的疾病處置編碼量大增,且編碼複雜度增加,疾病編碼所需時間倍增,雖由專業疾病分類師編碼,但仍恐增加人工編碼錯誤率提高之風險。為解決上述人力問題及提高疾病編碼一致性與編碼品質,有醫療機構試以KNN方式預測,而本研究試圖以不同機器學習模型預測,探討是否有不同的效果。本研究將某醫學中心經專業疾病分類師編碼資料,以病歷摘要之出院病摘文字內容作為輸入資料,運用機器深度學習,採用遞歸類神經網絡(Recurrent neural network, RNN),以雙向長短期記憶模型(long short term memory, bidirectional LSTM)模型,作為診斷編碼分類(ICD-10-CM)之演算法,經過機器的學習後預測疾病分類編碼,並找出較佳之運算模式。本研究結果發現,運用機器深度學習後,可在預測出現率前500項診斷預測準確度達0.8,F分數達0.7,其中前100項的疾病編碼精確率(Precision)達0.8596與召回率(Recall)為0.7925,F分數為0.82之結果,較文獻資料採KNN預測編碼佳,可作為醫師住院中疾病編碼的運用及疾病分類師編碼品質檢核參考。 |
英文摘要 |
In 2016 the Ministry of Health and Welfare in Taiwan replaced ICD-9 CM with ICD10-PCS/CM as the coding standard for all medical professionals at local hospitals and clinics to use in their routine procedures of diagnoses and operational medical treatments, for the purposes of statistical classification, analyses and studies of the patient's records as well as in the process of requesting medical funding by the universal health insurance system. The coding process is mandatory. However, the changeover from ICD-9 to ICD10 produces a number of problems. The number of code increases dramatically and thus the complexity and the time required of the coding task also increases substantially. Although the coding task is carried out by trained professionals, it is error-prone nonetheless. To minimize human-made error and increase the consistency and quality of coding, some medical institute has implemented a machine model called KNN (K-nearest-neighbor) to help out the coding task. In this study, we introduced a different method to perform such a coding task to see if they could make further improvements. This study used data collected by Kaohsiung Veterans General Hospital, which is one of the most prestigious medical centers located in Taiwan. The data consisted of diagnostics summaries of some hospital-released patients which were coded by trained professionals. Bidirectional long short-term memory (LSTM) was chosen as the model to diagnose those ICD-10-CM codes. Our method achieved a precision rate of 0.8596, a recall rate of 0.7925, and an F score of 0.82 for predicting top 100 occurred codes, which is somewhat better than the KNN method in the literature. The result can be used as a reference for less experienced ICD-10 coders. |