Wednesday, April 23, 2025

On #factoranalysis of #IQ tests—impact of software choice—plus comments about art+science of factor analysis in #intelligence test research—#schoolpsychology



Contributing to the reproducibility crisis in Psychology: The role of statistical software choice on factor analysis.  Journal of School Psychology.  Stefan C. Dombrowski.  Click here to view article source and abstract.

This is an important article for those who conduct (and also those who consume) factor analysis results of intelligence or cognitive ability tests.  

Abstract (note - bold font in abstract has been added by me)

A potentially overlooked contributor to the reproducibility crisis in psychology is the choice of statistical application software used for factor analysis. Although the open science movement promotes transparency by advocating for open access to data and statistical methods, this approach alone is insufficient to address the reproducibility crisis. It is commonly assumed that different statistical software applications produce equivalent results when conducting the same statistical analysis. However, this is not necessarily the case. Statistical programs often yield disparate outcomes, even when using identical data and factor analytic procedures, which can lead to inconsistent interpretation of results. This study examines this phenomenon by conducting exploratory factor analyses on two tests of cognitive ability—the WISC-V and the MEZURE—using four different statistical programs/applications. Factor analysis plays a critical role in determining the underlying theory of cognitive ability instruments, and guides how those instruments should be scored and interpreted. However, psychology is grappling with a reproducibility crisis in this area, as independent researchers and test publishers frequently report divergent factor analytic results. The outcome of this study revealed significant variations in structural outcomes among the statistical software programs/applications. These findings highlight the importance of using multiple statistical programs, ensuring transparency with analysis code, and recognizing the potential for varied outcomes when interpreting results from factor analytic procedures. Addressing these issues is important for advancing scientific integrity and mitigating the reproducibility crisis in psychology particularly in relation to cognitive ability structural validity.

My additional comments

The recommendation that multiple factor analysis software programs be used when analyzing the structural validity of cognitive abilities tests makes sense.  Kudos to Dr. Dombrowski for demonstrating this need.

Along these lines, it is also important to recognize that the use and interpretation of any factor analysis software is highly dependent on the statistical and substantive expertise and skills of the researcher.  I made these points (based on the writings and personal conversations with Jack Carroll) in a recent article (McGrew, 2023; open access so you can download and read) in the Journal of Intelligence.  The salient material is reproduced below.  This article can be accessed either a the journal website or via the Research and Reports section of my MindHub web page (McGrew, 2023)


(Note - Bold font in text below, extracted from McGrew (2023), is not in the original published article)

“I was fortunate to learn important tacit EFA and CFA knowledge during my 17 years of interactions with Carroll, and particularly my private one-to-one tutelage with Carroll in May 2003. Anyone who reads Chapter 3 (Survey and Analysis of Correlational and Factor-Analytic Research on Cognitive Abilities: Methodology) of Carroll's 1993 book, as well as his self-critique of his seminal work (Carroll 1998) and other select method-focused post-1993 publications (Carroll 1995, 1997), should conclude what is obvious—to Carroll, factor analyses were a blend of art and science. As articulated by some of his peers (see footnote #2), his research reflected the work of an expert with broad and deep substantive knowledge of research and theories in intelligence, cognitive psychology, and factor analysis methods. 

In 2003, after Carroll had been using CFA to augment his initial EFA analyses for at least a decade, Carroll expressed (to me during our May 2003 work week) that he was often concerned with the quality of some reported factor analyses (both EFA and CFA) of popular clinical IQ tests or other collections of cognitive ability measures (Carroll 1978, 1991, 1995, 2003). Carroll's characteristic positive skepticism regarding certain reported factor analyses was first articulated (as far as I know) in the late 1970's, when he stated “despite its many virtues, factor analysis is a very tricky technique; in some ways it depends more on art than science, that is, more on intuition and judgment than on formal rules of procedure. People who do factor analysis by uncritical use of programs in computer packages run the risk of making fools of themselves” (Carroll 1978, p. 91; emphasis added). It is my opinion that Carroll would still be dismayed by some of the EFA and CFA studies of intelligence tests published during the past two decades that often used narrow or restricted forms of factor analysis methods and rigid formal statistical rules for decision-making, with little attempt to integrate contemporary substantive research or theory to guide the analysis and interpretation of the results (e.g., see Decker 2021; Decker et al. 2021; McGrew et al. 2023). 

Carroll's unease was prescient of recently articulated concerns regarding two aspects of the theory crises in structural psychological research—the conflation of statistical (primarily factor analysis) models with theoretical models and the use of narrow forms of factor analysis methods (Fried 2020; McGrew et al. 2023). First, many intelligence test batteries only report CFA studies in their technical manuals. EFA results, which often produce findings that vary from CFA findings, are frequently omitted. This often leads to debates between independent researchers and test authors (or test publishers) regarding the validity of the interpretation of composite or cluster scores, leaving test users confused regarding the psychometric integrity of composite score interpretations. McGrew et al. (2023) recently recommended that intelligence test manuals, as well as research reports by independent researchers, include both EFA and CFA (viz., bifactor g, hierarchical g, and Horn no-g models), as well as psychometric network analysis (PNA) and possibly multidimensional scaling analyses (MDSs; McGrew et al. 2014; Meyer and Reynolds 2022). As stated by McGrew et al. (2023), “such an ecumenical approach would require researchers to present results from the major classes of IQ test structural research methods (including PNA) and clearly articulate the theoretical basis for the model(s) the author's support. Such an approach would also gently nudge IQ test structural researchers to minimize the frequent conflation of theoretical and psychometric g constructs. Such multiple-methods research in test manuals and journal publications can better inform users of the strengths and limitations of IQ test interpretations based on whatever conceptualization of psychometric general intelligence (including models with no such construct) underlies each type of dimensional analysis” (p. 24).”