結合正則化特徵工程與集成堆疊式學習準確率之研究

邱紹豐; 盧亭瑄; 陳楷智

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	結合正則化特徵工程與集成堆疊式學習準確率之研究
並列篇名	Improving Model Efficiency by Combining Feature Engineering and Ensemble Stacking Learning
作者	邱紹豐、盧亭瑄、陳楷智
中文摘要	機器學習技術在各個領域的應用和成就，充分展示了其巨大的潛力和廣泛的應用前景。其中集成式學習是一種分階段應用不同機器學習演算法來訓練模型的技術，透過不同演算法間彼此校正誤差，可以降低模型的偏見並提高模型預測的準確率。集成學習中各個機器學習演算法獨自以資料集訓練資料，因此訓練模型的計算成本常常會較其他機器學習的方法高。如果能夠降低訓練資料量，且不會影響資料集中各特徵的代表性，就可以在不影響模型品質的前提之下，有效降低訓練的成本。而在機器學習領域中常用的特徵工程技術，則藉由原始數據中提取出對模型有意義的特徵，且不會降低整體資料的完整性。特徵工程應用在模型訓練的資料前處理中，可以提高模型的泛化能力、降低過擬合風險，並且有效降低模型訓練時間。本研究以特徵工程中特徵擷取法的三種方法，過濾法、包裝法、以及嵌入法等，以訓練資料比較彼此之間的準確率，作為集成式學習資料前處理的方法。實驗中也與特徵擷取常用的主成分分析方法比較，以驗證本研究所選擇的特徵工程方法具較高的優勢。在集成學習則採用集成堆疊式的學習方法，藉由其分階段整合不同學習器的預測結果，提升模型的品質。此外，為避免模型偏見過高的問題，本研究中以多樣性的觀點作為設計的重點，也就是提高學習器間的異質性，更提高模型的預測能力。為驗證所提出的架構，本研究採用UCI資料庫的威斯康辛乳癌資料集進行實驗，整合特徵工程與集成堆疊學習演算法作為訓練的方法，並與其他集成學習的方法在精確率、召回率、F1成績、與準確率比較，結果均顯示以本研究所設計的多樣性集成堆疊方法有較佳的成績，提供一種具備更高預測率的學習架構。
英文摘要	Despite their potential, machine learning models are susceptible to overfitting and are expensive to train, and ensemble learning and feature engineering can be used to improve generalizability and reduce training costs. Ensemble learning, where various models are independently trained on a dataset, improves accuracy but increases computational cost. Feature engineering, where meaningful features are extracted from the training data, can be used to reduce the data volume and thus training costs for ensemble models without compromising model quality. In this study, several feature engineering methods, namely filter methods, wrapper methods, and embedded methods, were evaluated for their effectiveness as preprocessing methods for ensemble learning. These methods were evaluated against commonly used feature extraction methods, such as principal component analysis. The best-performing ensemble learning method that was then adopted in the proposed method was ensemble stacking learning, in which the predictions of each learner are integrated in stages to improve model quality. To reduce model bias, heterogeneity among learners was emphasized to further improve predictive capability. The proposed framework was validated experimentally on the Wisconsin Breast Cancer Dataset. The proposed diverse ensemble stacking method outperformed its state-of-the-art counterparts in precision, recall, F1 score, and accuracy.
起訖頁	27-36
關鍵詞	集成式學習、特徵工程、多樣性架構、主成分分析、Ensemble Learning、Feature Engineering、Diversity Structure、Principal Component Analysis
刊名	科學與工程技術期刊
期數	202503 (21:1期)
出版單位	大葉大學
該期刊-上一篇	應用機械學習於高壓瓶閥音洩故障診斷研究
該期刊-下一篇	利用液相沉積法製備TiO 2抗氫腐蝕薄膜於316L基材之研究