| 英文摘要 |
In 1995, Taiwan's government initiated the National Health Insurance (NHI) program in order to marshal resources to resolve difficulties that people may encounter when paying for health care. Under this program, most medical organizations apply for medical treatment fees from Bureau of the NHI according to diagnosis-related group (DRG) codes based on the International Classification of Disease, 9th Version, Clinical Modification (ICD-9-CM). The application process requires specialists to distinguish ICD-9-CM codes using the discharge diagnoses of doctors. This process is inefficient, time-consuming and tedious, especially when performed manually. These problems can potentially be reduced, using automatic classification methods. To improve the efficiency of ICD-9-CM predictions, we explored three well-known methods: Naive Bayes, support vector machine (SVM) and vector space model (VSM) with term frequency (TF) and TF multiplied by the inverse document frequency (TF-IDF), respectively weighted for feature selection in the discharge diagnoses used by six hospital departments. This paper also explores whether use of an ontology influences prediction accuracy. The experimental results show that the preferred method is SVM without feature weighting, although hospital departments show a mean macro-averaged F-measure score (F) of 0.7937, which varies from 0.7374 to 0.9009. Based on the selected hospital departments, VSM with TF-IDF with a threshold 0.1 was only appropriate for the cardiology department, while the models for the other departments were not modified. Regarding usage of an ontology, synonym replacement does not work very efficiently, although TF-IDF showed less improvement than TF. In summary, SVM is recommended to predict ICD-9-CM. |