| 英文摘要 |
Lexical complexity is crucial for reading comprehension. In the past, research work of lexical complexity prediction mainly focuses on differentiating the complexity difference between two words. Moreover, most of the previous lexical complexity prediction approaches only consider traditional lexically relevant features. In this paper, we propose a novel supervised approach using word embeddings features to tackle the lexical complexity prediction problem as a single-label multiclassification problem. We discuss four word embeddings techniques including Word2Vec, fastText, GloVe, and BERT. We also discuss five classification models including k-Nearest Neighbors, Support Vector Machines, Multilayer Perception, Random Forest, and XGBoost. The prediction models are evaluated with three datasets in English, Traditional Chinese, and Japanese. The results show that SVM with fastText can achieve the highest accuracy of 66.23% for the English dataset. SVM with GloVe can achieve the highest accuracy of 53.84% for the Traditional Chinese dataset. SVM with Word2Vec can achieve the highest accuracy of 49.96% for the Japanese dataset. |