比較三種資料探勘演算法預測子宮頸癌五年存活的外部通用性效能

張語恬; 朱基銘; 簡戊鑑; 周雨青; 楊燦; 盧瑜芬; 白健佑; 白璐; Wetter, Thomas; 孫建安; 羅慶徽

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	比較三種資料探勘演算法預測子宮頸癌五年存活的外部通用性效能
並列篇名	Predicting Cervical Cancer Survivability: A Comparison of Three Data Mining Methods
作者	張語恬、朱基銘、簡戊鑑、周雨青 (You-Ching Chou)、楊燦 (Tsan Yang)、盧瑜芬、白健佑、白璐、Wetter, Thomas (Wetter, Thomas)、孫建安 (Chien-An Sun)、羅慶徽
中文摘要	本研究比較類神經網路、邏輯斯迴歸及決策樹三種資料探勘演算法，使用不同診斷年份的樣本作模型訓練，對預測子宮頸癌五年存活情形的效能，並進行外部通用性(External Generalization)驗證。本研究採用美國國家癌症研究所(NCI: National cancer Institute)所提供的流行病學調查(SELR: the Surveillance, Epidemiology, and End Results)數據中的癌症登記資料庫(CIPUD, Cancer Incidence Public-use Database)，從西元1973年至西元2000年間選取156，502筆資料記錄及72個變項，經過資料清理後，留下與預測子宮頸癌五年存活較相關的18個變項，與子宮頸癌診斷年份為1988-1996年的資料共2,022筆，依診斷年份將樣本，分成8組不同的模型訓練樣本與測試樣本，帶入類神經網路(artificial neural network)、決策樹(decision tree)以及邏輯斯迴歸(logistic regression)三種演算法造出模型，以AUC (area under the ROC curve)、準確率(accuracy)，作為演算法預測能力評估，並找出可以得到良好預測結果的模型設計。結果顯示：內部驗證的模型預測力最好的為類神經網路的模型1，其AUC與準確率值分別為0.9392、0.9474。外部驗證的AUC結果，以類神經網路的模式7表現最好，其值分別為0.6455。在內部驗證(internal validation)的AUC與準確率結果表現，類神經網路與決策樹都較邏輯斯迴歸佳。在外部驗證(external validation)的AUC結果表現，類神經網路與邏輯斯迴歸都較決策樹好。類神經網路與邏輯斯迴歸建造的模型，有較好的外部通用性，而類神經網路與決策樹建造的模型，有較好的模型準確率。若想要得到較好的外部驗證結果，訓練樣本可以取過去的2-3年以上的資料。
英文摘要	The purpose of the study was to compare the performances of an artificial neural network (ANN), decision tree (C5), and logistic regression (LR) for predicting the 5-year survivability of cervical cancer and their external validation for generalization. The data was collected from SEER (Surveillance, Epidemiology, and End Results) of the NCI (National Cancer Institute) in the United States during the years 1973~2000. There were 156,502 cases with 72 variables. After the data was cleaned, there were 2,022 cases and 18 variables remaining during years 1988~1996. The dataset was divided into 8 categories of training sets and test sets, according to the year the patients were diagnosed. The 8 training sets were applied to three algorithms: 1) ANN, 2) C5, and 3) LR to build 8 models. The parameters of performance of the models were accuracy and AUC (Area under the ROC curve) for predicting 5-year survivability of cervical cancer patients. ANN had the best internal validation of the AUC and accuracy (AUC, 0.9392; accuracy, 0.9474) on model 1 and the best external validation of the AUC (0.6455) on model 7. ANN and C5 outperformed LR with respect to internal validation. ANN and LR both performed better than C5 in the external validation of the AUC. All in all, algorithms of ANN and LR performed better for external generalization, and algorithms of ANN and C5 performed more accurately for classification.
起訖頁	222-238
關鍵詞	cervical cancer survivability、logistic regression、artificial neural network、decision tree、AUC Area Under the ROC Curve
刊名	台灣家庭醫學雜誌
期數	200712 (17:4期)
出版單位	台灣家庭醫學醫學會
該期刊-上一篇	以手臂動脈硬度指數評估無症狀接受健康檢查者之心血管病風險
該期刊-下一篇	出院二週內之居家失能病患其主要照顧者家庭功能與生活品質之研究