Lexical Complexity Prediction using Word Embeddings

Cheng-Zen Yang; Jin-Jian Li; Shu-Chang Lin

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Lexical Complexity Prediction using Word Embeddings
並列篇名	Lexical Complexity Prediction using Word Embeddings
作者	Cheng-Zen Yang (Cheng-Zen Yang)、Jin-Jian Li (Jin-Jian Li)、Shu-Chang Lin (Shu-Chang Lin)
英文摘要	Lexical complexity is crucial for reading comprehension. In the past, research work of lexical complexity prediction mainly focuses on differentiating the complexity difference between two words. Moreover, most of the previous lexical complexity prediction approaches only consider traditional lexically relevant features. In this paper, we propose a novel supervised approach using word embeddings features to tackle the lexical complexity prediction problem as a single-label multiclassification problem. We discuss four word embeddings techniques including Word2Vec, fastText, GloVe, and BERT. We also discuss five classification models including k-Nearest Neighbors, Support Vector Machines, Multilayer Perception, Random Forest, and XGBoost. The prediction models are evaluated with three datasets in English, Traditional Chinese, and Japanese. The results show that SVM with fastText can achieve the highest accuracy of 66.23% for the English dataset. SVM with GloVe can achieve the highest accuracy of 53.84% for the Traditional Chinese dataset. SVM with Word2Vec can achieve the highest accuracy of 49.96% for the Japanese dataset.
起訖頁	279-287
關鍵詞	Lexical Complexity、Word Embeddings、Classification Models
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	生成模型是否能用於偵測身體羞辱仇恨言論?
該期刊-下一篇	PTT之諷刺語料分析－以＂確診＂與＂希望＂作為關鍵詞