| 中文摘要 |
在台灣衛生福利部所公布最新國人十大死因排行榜中,癌症已經連續40年高居死因第一名,而其中肺癌依舊是奪走最多台灣人生命的前2名的癌王(衛生福利部, 2023)[1],而且依據美國癌症學會(American Cancer Society)的統計,肺癌也是美國延續多年死亡率最高的癌症,且經學會的統計(Non-Small Cell Lung Cancer, NSCLC)更是占其中的多數,本研究限定非小細胞肺癌係因非小細胞肺癌患者佔肺癌的絕大多數,且本研究期望能利用深度學習的方法,由複雜的數據資料集中檢測出關鍵特徵的能力也揭示了其方法的重要性。 美國癌症登記資料庫(Surveillance, Epidemiology, and End Results Program, SEER),為美國國家癌症研究院(National Cancer Institute)下的計畫之一,SEER資料庫收集了美國參與此計畫的族群癌症發生與存活數據並免費提供其數據庫,SEER資料庫目前覆蓋大約美國一半人口的癌症登記收集與癌症的發病率和存活率等數據資料,因為其資料量大且免費所以此數據庫也常被研究人員當作預測模型的訓練資料來源。 本研究藉由機器學習與深度學習技術來進行分析美國癌症登記資料庫(Surveillance, Epidemiology, and End Results Program, SEER)相關非小細胞肺癌為分析對象,並以非小細胞肺癌患者資料切分為80%作為訓練資料,剩餘20%資料作為驗證資料,利用機器學習與深度學習方法建立預測模型,使用F1 score作為模型效能評估方式。先運用訓練資料訓練機器學習方法建立相關預測病患存活情形的預測模型,最後用驗證資料與F1 score的評估方式得知所有模型F1 score,其中以Deep Learning DNN的0.73為模型最好的結果,以此模型建立相關肺癌存活率的預測模型,藉以達到輔助醫護人員輔助醫療決策提供早期預測病患存活情形以提升該領域醫療品質。 |
| 英文摘要 |
According to the latest report from Taiwan’s Ministry of Health and Welfare, cancer has remained the leading cause of death among Taiwanese citizens for four decades. Lung cancer, in particular, consistently ranks among the top two cancer-related causes of mortality in Taiwan (Ministry of Health and Welfare, 2023)[1]. Similarly, data from the American Cancer Society show that lung cancer remains the most lethal cancer type in the United States. Among its subtypes, non-small cell lung cancer (NSCLC) constitutes the majority of cases, making it the central focus of this study. Given its prevalence, NSCLC presents a critical target for predictive modeling aimed at improving diagnostic and prognostic outcomes. This research leverages deep learning techniques to extract meaningful features from complex datasets, thereby enhancing clinical decision-making. The Surveillance, Epidemiology, and End Results (SEER) Program, administered by the U.S. National Cancer Institute, offers comprehensive data on cancer incidence and survival across participating populations. Covering approximately 50% of the U.S. population, the SEER database is widely recognized for its scale and accessibility, making it a valuable resource for developing predictive models. In this study, machine learning and deep learning methods were applied to SEER data, with a specific emphasis on NSCLC. The dataset was split into 80% for training and 20% for validation. Model performance was assessed using the F1 score metric. Initial training involved various machine learning algorithms to predict patient survival outcomes. Validation results indicated that the deep neural network (DNN) model achieved the highest F1 score of 0.73. Consequently, this model was selected to construct a predictive framework for NSCLC survival, aiming to facilitate early prognosis and enhance the quality of oncology care. |