Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: A meta-analysis.
This article presents a meta-analysis of studies investigating the effectiveness of the Mantel-Haenszel (MH) procedure when used to detect differential item functioning (DIF). Studies were located electronically in the main databases, representing the codification of 3,774 different simulation conditions, 1,865 related to Type I error and 1,909 to statistical power. The homogeneity of effect-size distributions was assessed by the Q statistic. The extremely high heterogeneity in both error rates (I² = 94.70) and power (I² = 99.29), due to the fact that numerous studies test the procedure in extreme conditions, means that the main interest of the results lies in explaining the variability in detection rates. One-way analysis of variance was used to determine the effects of each variable on detection rates, showing that the MH test was more effective when purification procedures were used, when the data fitted the Rasch model, when test contamination was below 20%, and with sample sizes above 500. The results imply a series of recommendations for practitioners who wish to study DIF with the MH test. A limitation, one inherent to all meta-analyses, is that not all the possible moderator variables, or the levels of variables, have been explored. This serves to remind us of certain gaps in the scientific literature (i.e., regarding the direction of DIF or variances in ability distribution) and is an aspect that methodologists should consider in future simulation studies. (PsycINFO Database Record (c) 2014 APA, all rights reserved)