中文摘要 |
The performance of the three procedures -- the logistic regression procedure (LogR), the likelihood ratio test (LRT), and the differential functioning of items and tests procedure (DFIT) in detecting differential item functioning (DIF) under the graded response model were compared in a simulation study. Factors manipulated included sample size, differences in the ability distributions between the focal and the reference groups, and four different percentages of DIF items contained in a test. For each of the sixteen combinations, 100 replications of DIF detection were simulated. All three DIF procedures adhered to nominal type I error rates under most conditions. LRT was the most powerful among the three under all situations. DFIT was less powerful than LRT, but also useful for DIF detection especially with groups of different ability distributions and relatively large percentage of DIF items. LogR, with mean powers lower than 0.4 in all conditions, appeared to be sensitive only to items with large DIF size. |