Purpose
Cancer registration registries serve as the empirical foundation for improving the quality of cancer care. Unlike current methods, which rely on manual review and screening and yield only a 50.4% reporting eligibility, this study leverages machine learning and natural language processing to extract key medical record information, thus enhancing the precision in selecting cases for reporting and in classifying cancer types.
Materials and Methods
The study utilized 3,000 categorized cases from 2017 and 2018, accompanied by 21,994 medical records, imaging reports, and pathology reports from a medical center in southern Taiwan, for machine learning training. A multiclass classification model, ML.NET Multiclass Classification SDCA Maximum Entropy, was employed, and keywords were annotated for 30 types of cancer to construct a smart prediction module.
Results
The screening results were categorized into three groups: “to be reported”, “not to be reported”, and “suspected cases.” The intelligent system achieved an average accuracy rate of 89.7% in case reporting and 89.5% in cancer-type classification.
Conclusion
This smart predictive system enhances the efficiency of cancer case screening, allowing registry staff to focus on the completeness and accuracy of data extraction. Future iterations could incorporate image and text recognition to strengthen the predictive capabilities of the system, thereby providing higher analytical value to clinical teams.