| 英文摘要 |
Despite their potential, machine learning models are susceptible to overfitting and are expensive to train, and ensemble learning and feature engineering can be used to improve generalizability and reduce training costs. Ensemble learning, where various models are independently trained on a dataset, improves accuracy but increases computational cost. Feature engineering, where meaningful features are extracted from the training data, can be used to reduce the data volume and thus training costs for ensemble models without compromising model quality. In this study, several feature engineering methods, namely filter methods, wrapper methods, and embedded methods, were evaluated for their effectiveness as preprocessing methods for ensemble learning. These methods were evaluated against commonly used feature extraction methods, such as principal component analysis. The best-performing ensemble learning method that was then adopted in the proposed method was ensemble stacking learning, in which the predictions of each learner are integrated in stages to improve model quality. To reduce model bias, heterogeneity among learners was emphasized to further improve predictive capability. The proposed framework was validated experimentally on the Wisconsin Breast Cancer Dataset. The proposed diverse ensemble stacking method outperformed its state-of-the-art counterparts in precision, recall, F1 score, and accuracy. |