Objectives
Special education researchers often increase their understanding of specific groups through intergroup comparisons. From the psychometrics perspective, whether the psychological testing results of different groups are comparable is linked to the measured construct’s measurement invariance (MI) properties. Methodological research has shown that inattention to a measure’s MI properties might lead to false significant findings or lower the probability of a statistical analysis accurately estimating intergroup differences. However, our review of one hundred and ninety-two empirical studies published in two Taiwan Social Sciences Core indexed (TSSCI) special education journals over the past ten years revealed that out of the 64 studies conducting cross-group comparisons based on construct measure results, only six mentioned or examined MI. We believe such a lack of attention to MI is due to the absence of empirical examples in special education demonstrating its impact on the interpretability and replicability of study results. Therefore, the present research aimed to fill this gap in the literature and enhance special education researchers’ understanding of the importance of MI and its testing procedures.
Methods
We conducted a secondary analysis of data from a published article to show how MI analyses can increase the alignment between significant statistical results and theory-based hypotheses about the differences between children with disability from immigrant and non-immigrant families. The original article reported that in the three-factor (family status, living skill, and communication skill), 14-item construct measure it used, one subscale’s composite scores (living skills) and four items’ observed scores did not show the expected intergroup differences in t-tests. The authors did not provide a consistent explanation for these unexpected results in the article. Since MI was not addressed in the article, we first examined the measure’s MI properties by identifying the non-invariant parameters and developing a partial invariance model. Next, we compared the latent mean difference and non-invariant parameter estimates in the partial invariance model with the t-test results from the original paper to see if MI testing improved the interpretability of the unexpected results in the original article. In addition, using the final partial invariance model as the population model, we performed simulations to quantify the impact of MI by manipulating the latent mean differences of each factor across groups. Specifically, there are two conditions in our simulation: condition 1: latent mean differences from the final partial invariance model = 0.210, 0.282, and 0.410, and condition 2: no latent mean difference for all subscales. One thousand replications were generated per condition. For each replication, we conducted an intergroup mean comparison with the partial invariance model that adjusted for the detected non-invariance parameters and traditional independent t-tests based on the composite scores of the three subscales. The power and type I error rates of these two methods were calculated to quantify the impact of MI on replicability and interpretability.
Results
By using the Chi-square difference tests and modification indices within the framework of multiple group factor analyses (MG-CFA), we found that five items in the construct measure have non-invariant intercepts across immigrant and non-immigrant groups. We released the equality constraints to build the corresponding partial scalar invariance model. In this model, the latent mean differences of all factors, including living skills, become significant (p < 0.05). Furthermore, by cross-referencing the items in the original paper where the unexpected results occurred with the non-invariant items identified in the partial invariance model, we found substantial overlap. Specifically, among the four items that did not show the expected difference in the original paper, three are also identified as non-invariant items. In addition, for all of these non-invariant items, their intercepts are higher in immigrant groups. This suggests that even though the latent scores of non-immigrant families are higher than those of immigrant families in the latent factors, their latent level differences may not manifest in the observed scores of these items, due to the cancel-out effect of higher non-invariant intercepts in the immigrant group. The follow-up simulations further showed that the difference in significance between the latent mean differences detected in the partial invariance model and the traditional non-significant t-tests based on observed composite scores was not merely a coincidence. Specifically, in the condition with latent mean differences, using the partial invariance model that accounts for non-invariance can almost double the power of detecting intergroup differences in comparison to the traditional t-tests (e.g., 0.712 vs. 0.326, for the living skills factor). On the other hand, in the condition with no latent mean difference, the partial invariance model approach can maintain the type I error rate around the nominal level (α = 0.053). In contrast, the type I error rate of the raw independent t-test was highly inflated (α = 0.514).
Conclusion and Suggestion
In the current study, we demonstrated that MI testing procedures can help researchers find statistically significant results that better align with theory-based hypotheses. Specifically, after considering MI, most of the unexpected non-significant intergroup comparisons in the original article become significant or can be explained by the pattern of non-invariant parameters across groups. In addition, the follow-up simulations further showed that considering MI can almost double the replication rate (power). Similarly, in the condition with no intergroup difference, our simulations reveal that considering MI can maintain the type I error rate at the nominal level. In contrast, ignoring MI and directly conducting intergroup comparisons based on observed scores may have more than a 50% chance of yielding spurious differences. Both statistical properties (i.e., high power and low type error rates) can enhance the interpretability of the results. As a result, we suggest researchers in the area of special education check the MI properties of the construct measures used in the studies before conducting intergroup comparisons based on the results of construct measures. We also provide detailed examples and explanations in the online appendix to guide researchers to complete MI testing step by step with the data they have on hand (https://osf.io/amswb/?view_only=fe30f285ca9a47108f2c7170381f792a). Readers who are interested in the topic can refer to the online resources and conduct the MI by themselves. We hope this article can be helpful for researchers in the area of special education.