英文摘要 |
Test validity is a property of the interpretation assigned to test scores. To provide an objective validating evidence for a standard-referenced assessment is especially important. In this study we gather validity evidences to support the interpretation of test results of the 2014 Comprehensive Assessment Program Junior High School Students in Taiwan. We utilized two methods in the cluster analysis, namely, the hierarchical clustering and expectation maximization (EM) algorithm, to explore the validity of one of the expert judgement technique- Yes/No Angoff standard setting method. The hierarchical clustering (HC) based on a minimum variance algorithm was first applied to segregate the examinees into three groups with respective abilities, namely, below basic, basic, and proficient. Under the assumption that each ability cluster is a Gaussian distribution and the overall distribution of each test subject data is a mixture of Gaussians (MoG), we initialized the unknown parameters, including the mean, variance and the proportion of each cluster of the MoG based on the HC results. Following the initialization of parameters, the EM algorithm was adopted to optimize the estimation of parameters, resulting in three clusters of ability groups. The results from the traditional Yes/No Angoff standard-setting procedure and the HC-EM cluster analysis were compared. To compare the grouping results of the two methods, we analyzed the examinees' school grades as external criteria to provide alternative evidence for checking the convergent validity of both methods. The study suggested that cluster analysis could be applied as a support tool to provide validating information in the process of standard setting for high-stakes achievement tests. |