Spam classification problems using support vector machine and grid search

Christine Dewi; Fransiskus Andika Indriawan; Henoch Juli Christanto

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Spam classification problems using support vector machine and grid search
並列篇名	Spam classification problems using support vector machine and grid search
作者	Christine Dewi (Christine Dewi)、Fransiskus Andika Indriawan (Fransiskus Andika Indriawan)、Henoch Juli Christanto (Henoch Juli Christanto)
英文摘要	Spam classification is an important task in identifying unwanted and potentially harmful emails for internet users. The increasing number of internet users highlights the growing importance of handling spam effectively. In this paper, we propose an approach for spam classification using Support Vector Machines (SVM) with grid search hyperparameter optimization. Our research differs from existing studies by specifically focusing on the integration of SVM with grid search to achieve optimal hyperparameter tuning. Additionally, we provide a unique dataset comprising diverse samples of spam emails for evaluation purposes. We also employ pre-processing techniques, including the removal of unnecessary words such as stop words and punctuation marks, as well as word stemming to convert words into their base forms. To optimize the performance of the SVM model, we use Grid Search to determine the optimal values for hyperparameters, including C, gamma, and the kernel. The results of the first experiment using SVM with the first dataset show that grid search yields the optimal parameters {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}, resulting in an accuracy improvement from 98.02% to 98.47%. In the second experiment using the second dataset, the accuracy obtained is 99.1%, compared to the previous non-optimized parameters which achieved 98.8%. These results indicate a significant improvement in spam classification accuracy. The experimental results demonstrate that our approach outperforms existing methods in terms of accuracy, precision, and recall. The findings of our research have significant implications for improving spam detection systems and enhancing the overall effectiveness of email communication.
起訖頁	1-10
關鍵詞	SVM、Spam classification、Grid search、Machine learning
刊名	國際應用科學與工程學刊
期數	202312 (20:4期)
出版單位	朝陽科技大學理工學院
該期刊-上一篇	The hardware simulation of ratio metric vector iteration algorithm in wireless sensor networks
該期刊-下一篇	The Bayesian CNN-LSTM classification model to predict and evaluate learner’s performance