運用機器學習演算法於早期肺癌風險分析

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	運用機器學習演算法於早期肺癌風險分析
並列篇名	Utilizing Machine Learning Models for Early-Stage Lung Cancer Risk Assessment
中文摘要	根據衛生福利部於2025年6月16日公布之2024年國人死因統計結果，惡性腫瘤已連續多年位居十大死因之首，其中肺癌仍為惡性腫瘤死亡原因之第一位（佔19.4%）。隨著肺癌持續為全球致死率最高之癌症之一，其疾病預防與早期診斷之重要性日益提升。然而，肺癌之發生涉及多重因素，包括人口學特徵、生活型態、共病狀況、環境暴露等，目前仍缺乏可全面且精準描述其風險形成機制之單一指標。本研究旨在探討早期肺癌患者相關因子與模型預測表現之關聯性，並評估不同機器學習方法於早期肺癌風險評估之應用潛力。研究資料來源為2010年至2023年間新竹國泰綜合醫院、台北國泰綜合醫院及汐止國泰綜合醫院癌症資料庫登錄系統，以已確診之早期肺癌個案為研究對象，分析變項涵蓋人口學資料及癌症相關臨床特徵。研究建構並比較五種機器學習模型，包括羅吉斯迴歸（Logistic Regression）、決策樹（Decision Tree）、隨機森林（Random Forest）、極限梯度提升（XGBoost），以及本研究所提出之混合模型Hybrid（單調性XGBoost）。模型效能以準確率、召回率、F1-score、ROC-AUC及混淆矩陣進行評估，並透過SHAP（Shapley Additive Explanations）分析模型中特徵之相對重要性，以提升模型可解釋性。研究結果顯示，在本研究資料架構與變項設定下，Hybrid（單調性XGBoost）模型於整體預測效能上表現最佳，其ROC-AUC為0.819，召回率達0.872，且偽陰性數低（FN = 10），顯示該模型於降低漏診風險方面具有潛在臨床應用價值，符合臨床篩檢對高敏感度模型之需求。在特徵重要性分析方面，SHAP結果顯示EGFR與ALK基因交互項於本研究資料中對模型預測具有較高之貢獻度。此結果主要反映模型於既有臨床資料結構下，如何利用可取得之分子層級資訊以提升分類效能，並不代表基因因素對早期肺癌具有確定或普遍之致病因果關係。考量基因檢測於臨床實務中並非一致施行，其結果亦受檢測時機與醫療流程影響，故本研究所呈現之基因變項重要性，應解讀為資料驅動之預測關聯，而非作為一般族群肺癌風險評估之直接依據。相較之下，年齡與教育程度等人口學變項於各模型中仍呈現穩定之輔助預測效果，顯示非分子層級因子於風險分層中仍具重要角色。綜合而言，Hybrid（單調性XGBoost）模型之F1-score為0.814，展現良好之整體分類效能，適合作為早期肺癌風險評估之最佳模型。本研究期望透過建立高敏感度之預警模型，協助臨床決策並推動早期診斷。
英文摘要	According to the Ministry of Health and Welfare’s announcement on June 16, 2025, of the 2024 leading causes of death among Taiwanese citizens, malignant neoplasms have ranked first among the top ten causes for consecutive years, with lung cancer remaining the leading cause of cancer mortality (19.4%). As lung cancer continues to be one of the deadliest cancers globally, the importance of disease prevention and early diagnosis has become increasingly prominent. However, lung cancer etiology involves multifaceted factors, including demographic characteristics, lifestyle behaviors, comorbidities, and environmental exposures, with no single indicator currently available to comprehensively and accurately describe its risk formation mechanisms. This study aims to investigate the associations between factors related to early-stage lung cancer patients and model predictive performance, while evaluating the application potential of various machine learning methods in early lung cancer risk assessment. Data were sourced from the cancer registry systems of Hsinchu Cathay General Hospital, Taipei Cathay General Hospital, and Sijhih Cathay General Hospital between 2010 and 2023, focusing on confirmed early-stage lung cancer cases. Analyzed variables encompassed demographic data and cancer-related clinical features. Five machine learning models were constructed and compared: Logistic Regression, Decision Tree, Random Forest, Extreme Gradient Boosting (XGBoost), and the hybrid model proposed in this study (Monotonic XGBoost). Model performance was evaluated using accuracy, recall, F1-score, ROC-AUC, and confusion matrices, with SHAP (SHAPley Additive exPlanations) employed to analyze feature relative importance and enhance model interpretability. Results indicated that, under the data framework and variable settings of this study, the Hybrid (Monotonic XGBoost) model exhibited the best overall predictive performance, achieving a ROC-AUC of 0.819, recall of 0.872, and a low number of false negatives (FN = 10), demonstrating potential clinical value in reducing missed diagnoses and aligning with the high-sensitivity requirements of clinical screening.In feature importance analysis, SHAP results revealed that the EGFR-ALK gene interaction term contributed substantially to model predictions in this dataset. This finding reflects how the model leverages available molecular-level information within existing clinical data structures to enhance classification performance, rather than implying a definitive or universal causal role of genetic factors in early lung cancer. Given that genetic testing is not uniformly performed in clinical practice and is influenced by testing timing and healthcare workflows, the gene variable importance presented here should be interpreted as data-driven predictive associations, not as direct bases for general population lung cancer risk assessment. In contrast, demographic variables such as age and education level demonstrated consistent auxiliary predictive effects across models, underscoring the role of non-molecular factors in risk stratification.In summary, the Hybrid (Monotonic XGBoost) model achieved an F1-score of 0.814, exhibiting robust overall classification performance and suitability as the optimal model for early lung cancer risk assessment. This study seeks to establish a high-sensitivity early warning model to support clinical decision-making and promote early diagnosis.
起訖頁	1-14
關鍵詞	早期肺癌、機器學習、風險分析、Early-stage lung cancer、Machine learning、Risk assessment
刊名	健康管理學刊
期數	202512 (23:2期)
出版單位	臺灣健康管理學會
該期刊-下一篇	應用腦電波控制3D布丁平衡遊戲於受測者專注力訓練之研究

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱