IQ's Corner: applied psychometrics

Showing posts with label applied psychometrics. Show all posts

Wednesday, December 11, 2024

Applied #psychometrics 101: Strong programs of #constructvalidity—the #theory - #measurement framework with emphasis on #substantive & #structural validity - #WJIV #WJV #shoolpsychology #psychology

The validity of psychological tests “is an overall judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1995, p. 741).

The ability to draw valid inferences regarding theoretical constructs from observable or manifest measures (e.g., test or composite scores) is a function of the extent to which the underlying program of validity research attends to both the theoretical and measurement domains of the focal constructs (Bensen, 1998; Bensen & Hagtvet, 1996; Cronbach, 1971; Cronbach & Meehl, 1955; Loevinger, 1957; Messick, 1995; Nunnally, 1978).

The theoretical—measurement domain framework that has driven the revisions of the WJ test batteries, particularly from the WJ-R to the forthcoming WJ V cognitive and achievement test batteries (Q1, 2025; COI disclosure: I am a coauthor of the current WJ IV and forthcoming WJ V), is represented in the figures below.

The goal of this post is to provide visual-graphic (Gv) images that hopefully, if properly studied by the reader (and if I did a decent job), provide the basic concepts of what constitutes the substantive component (and to a more limited extent the structural component) of a strong program of construct validity—in particular, the theoretical-measurement domain mapping framework used in the WJ-R to the forthcoming WJ V. The external stage of construct validity is not highlighted in this current post. The goal is for conceptual understanding…thus the absence of empirical data, etc.

For those who want written background information, the most succinct conceptual overview of a “strong program of construct validation” is Bensen (1998; click to download and read).

Otherwise…sit back and enjoy the Gv presenation…where five images equal at least one or more chapter in a technical manual :).

Be sure to click on each image to enlarge (and make readable)

This figure below was first published in a book on CHC theoretical (then known as Gf-Gc) interpretation of the Wechsler intelligence test batteries (Flanagan, McGrew, & Ortiz, 2000).

Yea…I know. The following figure uses Gv as the sample cognitive ability construct domain and not Gf as in the prior figure. I first crafted the figure below in 2005 and don’t have the time (nor attentional control focus bandwidth) to make a new version. Consider the switch from the Gf domain (above) to Gv (below) as a test of your understanding of the material…and ability to generalize what you have learned. And yes, I do see there is a spelling error (“on” for “one”)…but it is an old image file and I don’t have time to “clean it up” as noted above. The primary new feature is the addition of the concept of developing developmentally (difficulty) ordered sets of test items for the underling ability trait scales for manifest indicator tests C and D under the CHC theoretical narrow ability domain of spatial scanning, under the broad ability domain of Gv. This is where IRT (Rasch model) item scaling is involved.

The following figure is drawn from the WJ IV technical manual (McGrew, LaForte, Schrank, 2014) and illustrates the three-stage structural validity process used in the WJ IV. The same process, with slightly different age groups and the addition of exploratory hierarchical psychometric network analysis (see exciting and ground-breaking work of Dr. Hudson Golino and colleagues) during stage 2A, will be presented in the WJ V technical manual (LaForte, Dailey & McGrew, Q1-2025).

Thursday, November 14, 2024

Stay tunned!!!! #WJV g and non-g multiple #CHC theoretical models to be presented in the forthcoming (2025) technical manual: Senior author’s (McGrew) position re the #pscyhometric #g factor and #bifactorg models.

(c) Copyright, Dr. Kevin S. McGrew, Institute for Applied Psychometrics (11-14-24)

Warning, may be TLDR for many. :). Also, I will be rereading again multiple times and may tweak minor (not substantive) errors and post updates….hey….blogging has an earthy quality to it:)

In a recent publication, Scott Decker, Joel Schneider, Okan Bulut and I (McGrew, 2023; click here to download and read) presented structural analysis of the WJ IV norm data using contemporary psychometric network analysis (PNA) methods. As noted in a clip from the article below, we recommended that intelligence test researchers, and particularly authors and publishers of the respective technical manuals for cognitive test batteries, needed to broaden the psychometric structural analysis of a test battery beyond the traditional (and almost exclusive) relieance on “common cause” factor analysis (EFA and CFA) methods to include PNA analysis…to compliment, not supplant factor based analyses.

(Click on image to enlarge for easier reading)

Our (McGrew et al., 2023) recommendation is consistent with some critics of intelligence test structural research (e.g., see Dombrowski et al., 2018, 2019; Farmer et al., 2020) who have cogently argued that most intelligence test technical manuals typically present only one of the major classes of possible structural models of cognitive ability test batteries. Interestingly, many school psychology scholars who conduct and report independent structural analysis of a test battery also do something similar…they often only present one form of structural analysis—-namely, bifactor g analyses.

In McGrew et al. (2023) we recommended future cognitive ability test technical manuals embrace a more ecumenical multiple method approach and include, when possible, most all major classes of factor analysis models, as well as PNA. A multiple-methods research approach in test manuals (and journal publications by independent researchers) can better inform users of the strengths and limitations of IQ test interpretations based on whatever conceptualization of psychometric general intelligence (including models with no such construct) underlies each type of dimensional analysis. Leaving PNA methods aside for now, the figure below presents the four major families of traditional CHC theoretical structural models. These figures are conceptual and are not intended to represent all nuances of factor models.

(Click on image for a larger image to view)

Briefly, the four major families of traditional “common cause” CHC CFA structural models (Carroll, 2003; McGrew et al., 2023) vary primarily in the specification (or lack thereof) of a psychometric g factor. The different families of CHC models are conceptually represented in the figure above. In these conceptual representations the rectangles represent individual (sub)tests, the circles latent ability factors at different levels of breadth or generality (stratum levels as per Carroll, 1993), the path arrows the direction of influence (the effect) of the latent CHC ability factors on the tests or lower-order factors, and the single double headed arrow all possible correlations between all CHC broad CHC factors (in the Horn no-g model in panel D).
The classic hierarchical g model “places a psychometric g stratum III ability at the apex over multiple broad stratum II CHC abilities” (McGrew et al., 2023, p. 2). This model is most often associated with Carroll (1993; 2003) and is called (in panel A in the above figure) the Carroll hierarchical g broad CHC model. In this model the shared variance of subsets of moderately to highly correlated tests are first specified as 10 CHC broad ability factors (i.e., the measurement model; Gf, Gc, Gv, etc.). Next the covariances (latent factor correlations) among the broad CHC factors are specified as being the direct result of a higher-order psychometric g factor (i.e., the structural model).

A sub-model under the Carroll hierarchical g broad CHC model includes three levels of factors—several first-order narrow (stratum I) factors, 10 second-order broad (stratum II) CHC factors, and the psychometric g factor (stratum III). This is called the Carroll hierarchical g broad+narrow CHC model in panel B in the figure above. In the above example, two first-order narrow CHC factors (auditory short-term storage-Wa; and auditory working memory capacity-Wc, which, in simple terms, is a factor defining auditory short-term memory tasks that also include heavy attentional control-based (AC as per Schneider & McGrew, 2018) active manipulation of stimuli—the essence of Gwm or working memory). For illustrative purposes, a narrow naming facility (NA) first-order factor, which has higher-order effects or influences from broad Gs and Gr is specified for evaluation. Wouldn’t you like to see the results of this hierarchical broad+narrow CHC model? Well……..stay tunned for the forthcoming WJ V technical manual (Q1 2025; LaForte, Dailey, & McGrew, 2025, in preparation) and your dream will come true.

The third model is the Horn no-g model (McGrew, et al., 2023). John Horn long argued that psychometric g was nothing more than a statistical abstraction or artifact (Horn, 1998; Horn & Noll, 1997; McArdle, 2007; McArdle & Hofner, 2014; Ortiz, 2015) and did not represent a brain or biologically based real cognitive ability. This is represented by the Horn no-g broad CHC model in panel D. The Horn no-g broad CHC model is like the Carroll hierarchical g broad CHC model, but the 10 broad CHC factor intercorrelations are retained instead of specifying a higher- or second-order psychometric g factor. In other words, the measurement models are the same but the structural models are different. In some respects the Horn no-g broad CHC model is like contemporary no-g psychometric network analysis models (see McGrew, 2023) that eschew the notion of a higher-order latent psychometric g factor to explain the positive definite correlation variance between individual tests (or first-order latent factors in the case of the Horn no-g model) in an intelligence battery (Burgoyne et al. 2022; Conway &Kovacs, 2015; Euler et al., 2023; Fried, 2020; Kan et al. 2019; Kievit et al. 2016; Kovacs & Conway, 2016, 2019; McGrew, 2023; McGrew et al., 2023; Protzko & Colom 2021a, 2021b, van der Maas et al. 2006, 2014, 2019). Over the past decade I’ve become more aligned with no-g psychometric network CHC models (e.g, process overlap theory or POT) or Horn’s no-g CHC model, and have, tongue-in-check, referred to the elusive psychometric g ability (not the psychometric g factor) as the “Loch Ness Monster of Psychology” (McGrew, 2021, 2022).

Three of these common cause CHC structural models (viz., Carroll hierarchical g broad CHC model, Carroll hierarchical g broad+narrow CHC, and Horn no-g broad CHC), as well as Dr. Hudson Golino and colleagues hierarchical exploratory graph analysis psychometric network analysis models (that topic is saved for another day), are to be presented in the structural analysis section of the forthcoming WJ V technical manual validity chapter. Stay tunned for some interesting analysis and interpretations in the “must read” WJ V technical manual. Yes….assessment professionals, a well written and thourough technical manual can be your BFF!

Finally, the fourth family of models, which McGrew et al. (2023) called g-centric models, are commonly known as bifactor g models. In the bifactor g broad CHC model (panel C in figure) the variance associated with a dominant psychometric g factor is first extracted from all individual tests. The residual (remaining) variance is modeled as 10 uncorrelated (orthogonal) CHC broad factors. The bifactor g model was excluded from the WJ V structural analysis. Why…..after I (McGrew et al., 2023) recommended that all four classes of traditional CHC structural analysis models should be presented in a test batteries technical manual????
Because…the complexity involved in specifying and evaluating bi-factor g models with 60 cognitive and achievement tests was found to be extremely complex and fraught with statistical convergence issues. Trust me…I tried hard and long to run bifactor g models for the WJ V norm data. It was possible to run bifactor g models separately on the cognitive and achievement sets of WJ V tests, but that does not allow for the direct comparison to the other three structural models that utilized all 60 cognitive and achievement tests in single CFA models. Instead, at of the time the WJ V technical manual analyses were being completed and are now being summarized, the Riverside Insights (RI) internal psychometric research team was tackling the complex issues involved in completing WJ V bifactor g models, first in the separate sets of cognitive and achievement tests. Stay tunned for future professional conference paper presentations, white papers, or journal article submissions by the RI research team.

Furthermore, the decision to not include bifactor g models does not suggest that the evaluation of WJ V bifactor g-centric CHC models is not important. As noted by Reynolds and Keith (2017), “bifactor models may serve as a useful mathematical convenience for partitioning variance in test scores” (p. 45; emphasis added). The bifactor g model pre-ordains “that the statistically significant lion’s share of IQ battery test variance must be of the form of a dominant psychometric g factor (Decker et al., 2021)” (McGrew, et al., 2023, p. 3). Of the four families of CHC structural models, the bifactor g model is the conceptual and statistical model that supports the importance of general intelligence (psychometric g) and the preeminence of the full-scale or global IQ score over broad CHC test scores (e.g., see Dobrowski et al., 2021; Farmer et al., 2021a, 2021b; McGrew et al., 2023)—a theoretical position inconsistent with the position of the WJ V senior author (yours truly) and with Dr. Richard Woodcock’s legacy (see additional footnote comments at the end). It is important to note that there is a growing body of research that has questioned the preference for bifactor g cognitive models based only on statistical fit indices, as structural model fit statistics frequently are biased in favor of bifactor solutions. Per Bonifay et al. (2017),“the superior performance of the bifactor model may be a symptom of ‘overfitting’—that is, modeling not only the important trends in data but also capturing unwanted noise” p. 184–185). For more on this, see Decker (2021), Dueber and Toland (2021), Eid et al., (2018), Greene et al. (2022), and Murray and Johnson(2013). See Dombroski et al. (2020) for a defense of some of the bifactor g criticisms.

Recognizing the wisdom of Box’s (1976) well known axiom that “all models are wrong, but some are useful” the WJ V technical manual authors (LaForte, Dailey, McGrew, 2025, in preparation) encourage independent researchers to use the WJ V norm data to evaluate and compare bifactor g CHC models with the models presented in forthcoming WJ V technical, as well as alternative models (e.g., PASS, process overlap theory, Cattell’s triadic Gf-Gc theory, etc.) suggested in the technical manual.

Footnote: Woodcock’s original (and enduring) position (Woodcock, 1978, 1997, 2002) regarding the validity and purpose of a composite IQ-type g score is at odds with the bifactor g CHC model. With the publication of the original WJ battery, Woodcock (1978) acknowledged the pragmatic predictive value of statistically partitioning cognitive ability test score variance into a single psychometric g factor, with the manifest total IQ score serving as a proxy for psychometric g. Woodcock stated “it is frequently convenient to use some single index of cognitive ability that will predict the quality of cognitive behavior, on the average, across a wide variety of real-life situations. This is the [pragmatic] rationale for using a single score from a broad-based test of intelligence” (p.126). However, Woodcock further stated that “one of the most common misconceptions about the nature of cognitive ability (particularly in discussions characterized by such labels as ‘IQ’ and ‘intelligence’) is that it is a single quality or trait held in varying degrees by individuals, something like [mental] height” (p. 126). In several publications Woodcock’s position regarding the importance of an overall general intelligence or IQ score was clear—“The primary purpose for cognitive testing should be to find out more about the problem, not to obtain an IQ” (Woodcock, 2002, p.6; also see Woodcock, 1997, p. 235). Two of the primary WJ III, WJ IV, and WJ V authors have conducted research or published articles (see Mather & Schneider, 2023; McGrew, 2023; McGrew et al., 2023) consistent with Woodcock’s position and have advocated for a Horn no-g or emergent property no-g CHC network model. Additionally, based on the failure to identify a brain-based biological g (i.e., neuro-g; Haier et al., 2024) in well over a century of research since Spearman first proposed g in the early 1900’s, McGrew (2020, 2021) has suggested that g may be the “Loch Ness Monster of psychology.” This does not imply that psychometric g is unrelated to combinations of different neurocognitive mechanisms, such as brain-wide neural efficiency and the ability of the whole-brain network, which is comprised of various brain subnetworks and connections via white matter tracts, to efficiently adaptively reconfigure the global network in response to changing cognitive demands (see Ng et al., 2024 for recent compelling research linking psychometric g to multiple brain network mechanisms and various contemporary neurocognitive theories of intelligence; NOTE…click link to download PDF of article and read sufficiently to impress your psychologist friends!!!!).

Monday, July 16, 2018

What is an applied psychometrician?

I wear a number of hats within the broad filed of educational psychology. One is that of an applied psychometrician. Whenever anyone asks what I do, I receive strange looks when that title rolls out of my mouth. I then always need to provide a general explanation.

I've decided to take a little time and generate a brief explanation. I hope this helps.

The online American Psychological Association (APA) Dictionary of Psychology defines psychometrics as: n. the branch of psychology concerned with the quantification and measurement of mental attributes, behavior, performance, and the like, as well as with the design, analysis, and improvement of the tests, questionnaires, and other instruments used in such measurement. Also called psychometric psychology; psychometry.

The definition can be understood from the two components of the word. Psycho refers to “psyche” or the human mind. Metrics refers to “measurement.” Thus, in simple terms, psychometrics means psychological measurement--it is the math and science behind psychological testing. Applied psychometrics is concerned with the application of psychological theory, techniques, statistical methods, and psychological measurement to applied psychological test development, evaluation, and test interpretation. This compares to more pure or theoretical psychometrics which focuses on developing new measurement theories, methods, statistical procedures, etc. An applied psychometrician uses the various theories, tools and techniques developed by more theoretical psychometricians in the actual development, evaluation, and interpretation of psychological tests. By way of analogy, applied psychometrics is to theoretical psychometrics, as applied research is to pure research.

The principles of psychometric testing are very broad in their potential application., and have been applied to such areas as intelligence, personality, interest, attitudes, neuropsychological functioning, and diagnostic measures (Irwing & Hughes, 2018). As noted recently by Irwing and Hughes (2018), psychometrics is broad as “It applies to many more fields than psychology, indeed biomedical science, education, economics, communications theory, marketing, sociology, politics, business, and epidemiology amongst other disciplines, not only employ psychometric testing, but have also made important contributions to the subject” (p. 3).

Although there are many publications of relevance to the topic of test development and psychometrics, the most useful and important single source is “the Standards for Educational and Psychological Testing” (aka., the Joint Test Standards; American Educational Research Association [AERA], American Psychological Association [APA], National Council on Measurement in Education [NCME], 2014). The Joint Test Standards outline standards and guidelines for test developers, publishers, and users (psychologists) of tests. Given that the principles and theories of psychometrics are generic (they cut across all subdisciplines of psychology that use psychological tests), and there is a standard professionally accepted set of standards (the Joint Test Standards), an expert in applied psychometrics has the skills and expertise to evaluate the fundamental, universal or core measurement integrity (i.e., quality of norms, reliability, validity, etc.) of various psychological tests and measures (e.g., surveys, IQ tests, neuropsychological tests, personality tests), although sub-disciplinary expertise and training would be required to engage in expert interpretation by sub-disciplines. For example, expertise in brain development, functioning and brain-behavior relations would be necessary to use neuropsychological tests to make clinical judgements regarding brain dysfunction, type of brain disorders, etc. However, the basic psychometric characteristics of most all psychological and educational tests (e.g., neuropsychological, IQ, achievement, personality, interest, etc.) assessment can be evaluated by professionals with expertise in applied psychometrics.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, DC: Author.

Irwing, P. & Hughes, D. J. (2018). Test development. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development (pp. 3-49. Hoboken, NJ: John Wiley & Sons

Thursday, July 12, 2018

Great psychometric resource: The Wiley Handbook of Psychometric Testing.

I just received my two volume set of this excellent resource on psychometric testing. There are not many good books that cover such a broad array of psychometric measurement issues. This is not what I would call "easy reading." This is more like a "must have" resource book to have "at the ready" when seeking to understand contemporary psychometric test development issues.

Monday, September 19, 2016

Embretson's (2016) integrated model of test development and validation

Click on images to enlarge.

Posted with Blogsy from my iPad mini 4

Monday, February 02, 2015

WJ IV Technical Manual Abstract Assessment Service Bulletin # 2 now available

A new WJ IV Assessment Service Bulletin (ASB) is now available at the WJ-IV Riverside website. It is a free download. The description at the site is below. Click here to visit the page.

This bulletin provides a summary of the procedures followed in developing and validating the WJ IV.

Throughout the development and design of the WJ IV, the test standards outlined in the Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014) were followed carefully.

Information in this bulletin is abstracted from the Woodcock-Johnson IV Technical Manual (McGrew, LaForte, & Schrank, 2014) and is intended as an overview to highlight important aspects of the WJ IV test design, reliability, and validity. Readers who are interested in more detailed information should consult the WJ IV Technical Manual.

ASB 1 (WJ IV Tests of Achievement Alternate-Forms Equivalence) is also available for download on the same web page.