英文摘要 |
We investigate the issue of classifying short essays based their linguistic issues, for English at the high school levels. A good selection of appropriate essays is crucial for the language learners and for the reading comprehension tests, which is an important type of tests for language competence examinations. Although the text alone does not allow us to judge the difficulty of reading comprehension tests, the capability to identify the levels of high school students for whom the texts were used in the reading comprehension can be an important step toward computer assisted selection of reading comprehension test items. We employed word-level statistics, sentence-level statistics, and syntactic-level information of the text, and applied several machine learning techniques for this text classification problem. Experimental results show that, with the best performing combination of features and learning method, we achieved 53.6% in accuracy. |