英文摘要 |
We introduce possibilities of automatic evaluation of surface text coherence (cohesion) in texts written by learners of Czech during certified exams for non-native speakers. On the basis of a corpus analysis, we focus on finding and describing relevant distinctive features for automatic detection of A1–C1 levels (established by CEFR–the Common European Framework of Reference for Languages) in terms of surface text coherence. The CEFR levels are evaluated by human assessors and we try to reach this assessment automatically by using several discourse features like frequency and diversity of discourse connectives, density of discourse relations etc. We present experiments with various features using two machine learning algorithms. Our results of automatic evaluation of CEFR coherence/cohesion marks (compared to human assessment) achieved 73.2% success rate for the detection of A1–C1 levels and 74.9% for the detection of A2–B2 levels. |