IQ's Corner: IAP101 Brief #12: Use of IQ component part scores as indicators of general intelligence in SLD and MR/ID diagnosis

Thursday, March 01, 2012

IAP101 Brief #12: Use of IQ component part scores as indicators of general intelligence in SLD and MR/ID diagnosis

Historically the concept of general intelligence (g), as operationalized by intelligence test battery global “full scale” IQ scores, has been central to the definition and classification of individuals with a specific learning disability (SLD) as well as individuals with an intellectual disability (ID). More recently, contemporary definitions and operational criteria have elevated intelligence test battery composite or part scores to a more prominent role in diagnosis and classification of SLD and more recently in ID.

In the case of SLD, third-method “consistency” definitions prominently feature component or part scores in (a) the identification of consistency between low achievement and relevant cognitive abilities or processing disorders and (b) the requirement that an individual demonstrate relative cognitive and achievement strengths (see Flanagan, Fiorello & Ortiz, 2010). The global IQ score is de-emphasized in the third-method SLD methods.

In contrast, the 11^th edition of the AAIDD Intellectual Disability: Definition, Classification, and Systems of Supports manual (AAIDD, 2010) placed general intelligence, and thus global composite IQ scores, as central to the definition of intellectual functioning. This has not been without challenge. For example, the AAIDD ID definition has been criticized for an over-reliance on the construct of general intelligence and for ignoring contemporary psychometric theoretical and empirical research that has converged on a multidimensional hierarchical model of intelligence (viz., Cattell-Horn-Carroll or CHC theory).

The potential constraints of the “ID-as-a-general-intelligence-disability” definition was anticipated by the Committee on Disability Determination for Mental Retardation, in its National Research Council report “Mental Retardation: Determining Eligibility for Social Security Benefits” (Reschly, Meyers & Hartel, 2001). This national committee of experts concluded that “during the next decade, even greater alignment of intelligence tests and the IQ scores derived from them and the Horn-Cattell and Carroll models is likely. As a result, the future will almost certainly see greater reliance on part scores, such as IQ scores for Gc and Gf, in addition to the traditional composite IQ. That is, the traditional composite IQ may not be dropped, but greater emphasis will be placed on part scores than has been the case in the past” (Reschly et al., 2002, p. 94). The committee stated that “whenever the validity of one or more part scores (subtests, scales) is questioned, examiners must also question whether the test’s total score is appropriate for guiding diagnostic decision making. The total test score is usually considered the best estimate of a client’s overall intellectual functioning. However, there are instances in which, and individuals for whom, the total test score may not be the best representation of overall cognitive functioning.” (p. 106-107).

The increased emphasis on intelligence test battery composite part scores in SLD and ID diagnosis and classification raises a number of measurement and conceptual issues (Reschly et al., 2002). For example, what are statistically significant differences? What is a meaningful difference? What appropriate cognitive abilities should serve as proxies of general intelligence when the global IQ is questioned? What should be the magnitude of the total test score?

Appropriate cognitive abilities will only be the only issue discussed here. This issue addresses which component or part scores are more correlated with general intelligence (g)—that is, what component part scores are high g-loaders? The traditional consensus has been that measures of Gc (crystallized intelligence; comprehension-knowledge) and Gf (fluid intelligence or reasoning) are the highest g-loading measures and constructs and are the most likely candidates for elevated status when diagnosing ID (Reschly et al., 2002). Although not always stated explicitly, the third method consistency SLD definitions specify that an individual must demonstrate “at least an average level of general cognitive ability or intelligence” (Flanagan et al., 2010, p.745), a statement that implicitly suggests cognitive abilities and component scores with high g-ness.

Table 1 is intended to provide guidance when using component part scores in the diagnosis and classification of SLD and ID (click on images to enlarge and use the browser zoom feature to view; it is recommended you click here to access a PDF copy of the table..and also zoom in on it). Table 1 presents a summary of the comprehensive, nationally normed, individually administered intelligence batteries that possess satisfactory psychometric characteristics (i.e., national norm samples, adequate reliability and validity for the composite g-score) for use in the diagnosis of ID and SLD.

The “Composite g-score” column lists the global general intelligence score provided by each intelligence battery. This score is the best estimate of a person’s general intellectual ability, which currently is most relevant to the diagnosis of ID as per AAIDD. All composite g-scores listed in Table 1 meet Jensen’s (1998) psychometric sampling error criteria as valid estimates of general intelligence. As per Jensen’s number of tests criterion, all intelligence batteries g-composites are based on a minimum of nine tests that sample at least three primary cognitive ability domains. As per Jensen’s variety of tests criterion (i.e., information content, skills and demands for a variety of mental operations), the batteries, when viewed from the perspective of CHC theory, vary in ability domain coverage—four (CAS, SB5), five (KABC-II, WISC-IV, WAIS-IV), six (DAS-II) and seven (WJ III) (Flanagan, Ortiz & Alfonso, 2007; Keith & Reynolds, 2010). As recommended by Jensen (1998), “the particular collection of tests used to estimate g should come as close as possible, with some limited number of tests, to being a representative sample of all types of mental tests, and the various kinds of test should be represented as equally as possible” (p. 85). Users should consult sources such as Flanagan et al. (2007) and Keith and Reynolds, 2010) to determine how each intelligence battery approximates Jensen’s optimal design criterion, the specific CHC domains measured, and the proportional representation of the CHC domains in each batteries composite g-score.

Also included in Table 1 are the component part scales provided by each battery (e.g., WAIS-IV Verbal Comprehension Index, Perceptual Reasoning Index, Working Memory Index, and Processing Speed Index), followed by their respective within-battery g-loadings.[1] Examination of the g-ness of composite scores from existing batteries (see last three columns in Table 1) suggests the traditional assumption that measures of Gf and Gc are the best proxies of general intelligence may not hold across all intelligence batteries.[2]

In the case of the SB5, all five composite part scores are very similar in g-loadings (h² = .72 to .79). No single SB5 composite part score appears better than the other SB5 scores for suggesting average general intelligence (when the global IQ score is not used for this purpose). At the other extreme is the WJ III where the Fluid Reasoning, Comprehension-Knowledge, Long-term Storage and Retrieval cluster scores are the best g-proxies for part-score based interpretation within the WJ III. The WJ III Visual Processing and Processing Speed clusters are not composite part scores that should be emphasized as indicators of general intelligence. Across all batteries that include a processing speed component part score (DAS-II, WAIS-IV, WISC-IV, WJ III) the respective processing speed scale is always the weakest proxy for general intelligence and thus, would not be viewed as a good estimate of general intelligence.

It is also clear that one cannot assume that composites with similar sounding names of measured abilities should have similar relative g-ness status within different batteries. For example, the Gv (visual-spatial or visual processing) clusters in the DAS-II (Spatial Ability), SB5 (Visual-Spatial Processing) are relatively strong g-measures within their respective battery, but the same cannot be said for the WJ III Visual Processing cluster. Even more interesting are the differences in the WAIS-IV and WISC-IV relative g-loadings for similarly sounding index scores.

For example, the Working Memory Index is the highest g-loading component part score (tied with Perceptual Reasoning Index) in the WAIS-IV but is only third (out of four) in the WISC-IV. The Working Memory Index is comprised of the Digit Span and Arithmetic subtests in the WAIS-IV and the Digit Span and the Letter-Number Sequencing subtests in the WISC-IV. The Arithmetic subtest has been reported to be a factorially complex test which may tap fluid intelligence (Gf-RQ—quantitative reasoning), quantitative knowledge (Gq), working memory (Gsm), and possible processing speed (Gs; Keith & Reynolds, 2010; Phelps, McGrew, Knopik & Ford, 2005). The factorially complex characteristics of the Arithmetic subtest (which, in essence, makes it function like a mini-g proxy) would explain why the WAIS-IV Working Memory Index is a good proxy for g in the WAIS-IV but not in the WISC-IV. The WAIS-IV and WISC-IV Working Memory Index scales, although named the same, are not measuring identical constructs.

A critical caveat is that the g-loadings cannot be compared across different batteries. g-loadings may change when the mixture of measures included in the analyses change. Different "flavors" of g can result (Carroll, 1993; Jensen, 1998). The only way to compare the g-ness across batteries is with appropriately designed cross- or joint-battery analysis (e.g., WAIS-IV, SB5 and WJ III analyzed in a common sample).

The above within and across intelligence battery examples illustrates that those who use component part scores as an estimate of a person’s general intelligence must be aware of the composition and psychometric g-ness of the component scores within each intelligence battery. Not all component part scores in different intelligence batteries are created equal (with regard to g-ness). Also, not all similarly named factor-based composite scores may measure the same identical construct and may vary in degree of within battery g-ness. This is not a new problem in the context of naming factors in factor analysis, and by extension, factor-based intelligence test composite scores, Cliff (1983) described this nominalistic fallacy in simple language—“if we name something, this does not mean we understand it” (p. 120).

[1] As noted in the footnotes in Table 1, all composite score g-loadings were computed by Kevin McGrew by entering the smallest number (and largest age ranges covered) of the published correlation matrices within each intelligence batteries technical manual (note the exception for the WJ III) in order to obtain an average g-loading estimate. It would have been possible to calculate and report these values for each age-differentiated correlation matrix for each intelligence battery. However, the purpose of this table is to provide the best possible average value across the entire age-range of each intelligence battery. Floyd and colleagues have published age-differentiated g-loadings for the DAS-II and WJ III. Those values were not used as they are based on the use of the principal common factor analysis method, a method that analyzes the reliable shared variance among tests. Although principal factor and principal component loadings typically will order measures in the same relative position, the principal factor loadings typically will be lower. Given that the imperfect manifest composite scale scores are those that are utilized in practice, and to also allow uniformity in the calculation of the g-loadings reported in Table 1, principal component analysis was used in this work. The same rationale was used for not using the latent factor loadings on a higher-order g-factor in SEM/CFA analysis of each test battery. Loadings from CFA analyses represent the relations between the underlying theoretical ability constructs and g purged of measurement error. Also, frequently the final CFA solutions reported in a batteries technical manual (or independent journal articles) allow tests to be factorially complex (load on more than one latent factor), a measurement model that does not resemble the real world reality of the manifest/observed composite scores used in practice. Latent factor loadings on a higher-order g-factor will often differ significantly from principal component loadings based on the manifest measures, both in absolute magnitude and relative size (e.g., see high Ga loading on g in WJ III technical manual which is at variance with the manifest variable based Ga loading reported in Table 1)

[2] The h²values are the values that should be used to compare the relative amount of g-variance present in the component part scores within each intelligence battery.