特徵選取演算法對可讀性模型的影響

戴采寧; 曾厚強; 宋曜廷

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	特徵選取演算法對可讀性模型的影響
並列篇名	Impact of Feature Selection Algorithms on Readability Model
作者	戴采寧、曾厚強、宋曜廷 (Yao-Ting Sung)
中文摘要	閱讀是獲得知識的重要方式之一。學者指出，為了促進閱讀的成效，提供難易適中的材料是非常重要的。若是閱讀的材料太過簡單，讀者通常無法在閱讀過程中獲得新知；反之，材料若是太難，會造成讀者過重的認知負擔，進而影響其學習成效。因此，給予讀者適性閱讀的材料是一個重要的議題。針對這個問題，有許多學者開始研究可讀性模型，並發現「特徵選取」（Feature Selection）被認為是一個可以提升可讀性模型準確率的重要方式。然而，各種特徵選取演算法和分類器（Classifier）之間的交互作用在過去的研究中並沒有大量地被探討。因此，本研究將使用三種特徵選取演算法：Chi-squared test、ANOVA及Mutual Information和25種分類器，應用於國文科1-12年級之可讀性模型準確率的比較。實驗結果將呈現準確率最高的模型之特徵選取演算法和分類器。本研究發現使用ANOVA做為特徵選取演算法來選取語言特徵並利用LGBM做為分類器時，只須採用累加13個特徵，在預測1-12年級的國文科課文就能達到準確率48%、鄰近準確率76%。
英文摘要	Reading is one of the most important ways of acquiring knowledge. Researchers have pointed out that to promote the effectiveness of reading, it is very important to provide materials of the right level of difficulty. If the reading materials are too easy, readers usually cannot acquire new knowledge in the process of reading; on the other hand, if the materials are too difficult, it will cause excessive cognitive burden to the readers, affecting their learning effectiveness. Therefore, giving readers appropriate reading is an important issue. To address this issue, many scholars have begun to develop readability models and found that feature selection enhances the accuracy of readability models. However, the interaction between various feature algorithms and classifiers has yet to be much explored in past studies. Therefore, in this study, three feature selection algorithms, Chi-squared test, ANOVA, Mutual Information, and 25 classifiers, were applied to compare the accuracy of readability models for grades 1-12 in the textbooks of the Chinese language. The experimental results show the feature selection algorithm and the paired classifiers with the highest accuracy. This study found that using ANOVA as the feature selection algorithm and LGBM as the classifier can have 48% accuracy, 73% adjacent accuracy, and 85% reduction in the number of features.
起訖頁	106-115
關鍵詞	中文文本可讀性、特徵選取、機器學習、分類器、Chinese Readability、Feature Selection、Machine Learning、Classifier
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	結合BERT與Wav2vec 2.0提升第二外語受試者之自動英語口說評測
該期刊-下一篇	聽障者多模中文口語訓練模型與分析