Purpose
The study aims to develop a performance scale and performance level descriptors (PLDs) for Mandarin reading in the fourth learning stage, and to provide sample items to illustrate the PLDs.
Main Theories or Conceptual Frameworks
Each standard-setting method has its own strengths. Most studies use a “content based standard-setting method” approach, but the standard-setting process can be influenced by subjective factors. This study, therefore, adopts the “scale anchoring” method, which is commonly employed in international large-scale educational assessments such as TIMSS and PISA, to help data users interpret the meaning represented by scale scores.
Research Design/Methods/Participants
A survey research method was employed to collect empirical data required for standard setting. A longitudinal study design was used to measure students’ performance in Mandarin reading, and the scale anchoring method was adopted to set performance standards and generate performance descriptors. The study population consisted of 7th-grade students from the 107th and 108th academic years, with the 107th academic year as Panel 1 and the 108th academic year as Panel 2 for followup samples. A two-stage stratified cluster sampling design was implemented: in the first stage, schools were selected using “probabilities proportional to size” sampling; in the second stage, classes within the selected schools were randomly sampled. All students in the sampled classes were included in the study. In Panel 1, 2,803 students were assigned for the Mandarin reading assessment in 7th grade and 2,807 in 8th grade. In Panel 2, 2,565 students were assigned in 7th grade and 2,780 in 8th grade. The assessment tool was a competency-based computerized online test of Mandarin reading, consisting of 344 items. Test booklets were assembled using a partially balanced incomplete block design. The assessment provided strong evidence of reliability and validity. Item difficulty was estimated using the partial credit model of item response theory.
Research Findings or Conclusions
The four cut scores for performance levels were set at 400, 475, 550, and 625, corresponding to levels M1, M2, M3, and M4. Descriptions for each performance level were developed to portray students’ developmental abilities. Reading tasks were categorized into two literacy types: general reading and digital reading. The reading cognitive processes were classified into three dimensions: locating information, understanding, and evaluating and reflecting. Performance level descriptors (PLDs) detailed the literacy skills demonstrated by students at each level for both types of reading. In general, item difficulty increased with cognitive complexity. However, the empirical data revealed that tasks with lower cognitive complexity could still challenge students, while tasks with higher cognitive complexity might be relatively easier.
Theoretical or Practical Insights/Contributions/Recommendations
This study validates the effectiveness of standard setting with large-scale empirical data and selects appropriate sample items based on empirical evidence. The resulting performance level descriptors (PLDs) reveal the differences in reading processes across performance levels, enabling teachers to accurately identify students’ difficulties in reading comprehension. Future research is recommended to develop PLDs for Mandarin reading literacy in the third and fifth learning stages, providing a longitudinal framework for PLDs across learning stages. Additionally, the development of digital reading items covering various difficulty levels is suggested to enrich the PLDs for evaluating and reflecting processes at M1 to M3 levels. Furthermore, the PLDs can serve as a “learning map” for designing instructional plans tailored to students at different performance levels, enhancing their practical application in teaching and learning contexts.