中文摘要 |
本研究目的在探究以人工智慧(Artificial Intelligence)方法與資料探勘技術(Data Mining)在子宮頸癌預測模式的運用,分別應用類神經網路(Artificial Neural Network)、決策樹(Decision Tree)以及邏輯斯迴歸(Logistic Regression)三種演算法,由預測準確率以及對預測結果的解釋能力做為演算法的評估指標。本研究採用資料探勘技術,以美國SEER (the Surveillance, Epidemiology, and End Results) 1973-2000年癌症登記資料庫(CIPUD, Cancer Incidence Public-Use Database)中433,272筆資料記錄及72個變項進行資料分析,再將資料進行10折交叉驗證(10-Fold cross-validation),用類神經網路、決策樹以及邏輯斯迴歸三種演算法來比較預測存活準確度。結果顯示:預測準確率分別如下邏輯斯迴歸分析模型為0.8974(敏感度0.9047,特異度0.8830);決策樹分析模型(C5)為0.8732(敏感度0.8639,特異度0.8966);類神經網路分析模型為0.7406(敏感度0.7394,特異度0.7473)。邏輯斯迴歸演算結果預測準確度出現極端值1.0 (100%)、0.9942 (99.42%),明顯高出預測準確度的平均值0.8981。在決策樹的模型中,預測結果普遍比邏輯斯迴歸高,但相差不大。在類神經網路模型中,預測準確度平均為0.7776,明顯低於邏輯斯迴歸及決策樹,在其10折的準確度也顯示出不穩定的狀況,標準差為0.0786,為三種模型中最高。以預測準確度的平均值而言,邏輯斯迴歸分析(0.8981)及決策樹分析(0.8926)優於類神經網路分析(0.7776),而且類神經網路模型10折交叉驗證的預測準確度標準差(0.0786)最大;這樣的情形顯示其預測能力相對於邏輯斯迴歸及決策樹模型表現不佳。 |
英文摘要 |
Objective: The purpose of the study was to investigate the use of artificial intelligence methods and data mining technology for predicting cervical cancer survivability. The 3 models of artificial neural network, decision tree and logistic regression were investigated and their accuracy values for predicting cervical cancer survivability were evaluated. Methods and material: The Surveillance, Epidemiology, and End Results (SEER), a large dataset, was used to develop the 3 prediction models. The 3 models were 2 popular data mining algorithms, which were artificial neural network and decision tree; and 1 common statistical model, which was logistic regression. The 10-fold cross-validation analysis also measured the unbiased estimation of 3 prediction results for comparing their performances. Results: The results of accuracy of 3 models were respectively 0.8981 of logistic regression, 0.8930 of decision tree and 0.7776 of artificial neural network. The results of logistic regression were ever 1.0 and 0.9942 accuracy. In 10-fold cross-validation analysis, the standard deviation of accuracy of artificial neural network was 0.0786 and it was the worst one among the 3 prediction models. Conclusions: In this research, artificial neural network performed the model for predicting cervical cancer survivability worse (lowest prediction accuracy and largest variation of accuracy in 10-fold cross-validation analysis) than logistic regression and decision tree. |