Showing posts with label KABC-II. Show all posts

Tuesday, November 22, 2016

Research Bytes: A Systematic Examination of the Linguistic Demand of Cognitive Test Directions Administered to School-Age Populations

A Systematic Examination of the Linguistic Demand of Cognitive Test Directions Administered to School-Age Populations

¹University of Alberta, Edmonton, Canada
²Rutgers University, New Burnswick, NJ, USA

Damien C. Cormier, Department of Educational Psychology, University of Alberta, 6-107E Education North, Edmonton, Alberta, Canada T6G 2G5. Email: dcormier@ualberta.ca

Abstract

The selection and interpretation of individually administered norm-referenced cognitive tests that are administered to culturally and linguistically diverse (CLD) students continue to be an important consideration within the psychoeducational assessment process. Understanding test directions during the assessment of cognitive abilities is important, considering the high-stakes nature of these assessments. Therefore, the linguistic demand of spoken test directions from the following commonly used cognitive test batteries was examined and compared: Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V), Woodcock–Johnson IV Tests of Cognitive Abilities (WJ IV COG), Cognitive Assessment System, Second Edition (CAS2), and Kaufman Assessment Battery for Children, Second Edition (KABC-II). On average, the linguistic demand of the standard test directions was greater than the linguistic demand of the supplementary test directions. When examining individual test characteristics, very few individual tests were identified as outliers with respect to the linguistic demand of their test directions. This finding differs from previous research and suggests that the linguistic demand of the required directions for most tests included in commonly used cognitive batteries is similar. Implications for future research and test development are discussed

Thursday, March 01, 2012

IAP101 Brief #12: Use of IQ component part scores as indicators of general intelligence in SLD and MR/ID diagnosis

Historically the concept of general intelligence (g), as operationalized by intelligence test battery global “full scale” IQ scores, has been central to the definition and classification of individuals with a specific learning disability (SLD) as well as individuals with an intellectual disability (ID). More recently, contemporary definitions and operational criteria have elevated intelligence test battery composite or part scores to a more prominent role in diagnosis and classification of SLD and more recently in ID.

In the case of SLD, third-method “consistency” definitions prominently feature component or part scores in (a) the identification of consistency between low achievement and relevant cognitive abilities or processing disorders and (b) the requirement that an individual demonstrate relative cognitive and achievement strengths (see Flanagan, Fiorello & Ortiz, 2010). The global IQ score is de-emphasized in the third-method SLD methods.

In contrast, the 11^th edition of the AAIDD Intellectual Disability: Definition, Classification, and Systems of Supports manual (AAIDD, 2010) placed general intelligence, and thus global composite IQ scores, as central to the definition of intellectual functioning. This has not been without challenge. For example, the AAIDD ID definition has been criticized for an over-reliance on the construct of general intelligence and for ignoring contemporary psychometric theoretical and empirical research that has converged on a multidimensional hierarchical model of intelligence (viz., Cattell-Horn-Carroll or CHC theory).

The potential constraints of the “ID-as-a-general-intelligence-disability” definition was anticipated by the Committee on Disability Determination for Mental Retardation, in its National Research Council report “Mental Retardation: Determining Eligibility for Social Security Benefits” (Reschly, Meyers & Hartel, 2001). This national committee of experts concluded that “during the next decade, even greater alignment of intelligence tests and the IQ scores derived from them and the Horn-Cattell and Carroll models is likely. As a result, the future will almost certainly see greater reliance on part scores, such as IQ scores for Gc and Gf, in addition to the traditional composite IQ. That is, the traditional composite IQ may not be dropped, but greater emphasis will be placed on part scores than has been the case in the past” (Reschly et al., 2002, p. 94). The committee stated that “whenever the validity of one or more part scores (subtests, scales) is questioned, examiners must also question whether the test’s total score is appropriate for guiding diagnostic decision making. The total test score is usually considered the best estimate of a client’s overall intellectual functioning. However, there are instances in which, and individuals for whom, the total test score may not be the best representation of overall cognitive functioning.” (p. 106-107).

The increased emphasis on intelligence test battery composite part scores in SLD and ID diagnosis and classification raises a number of measurement and conceptual issues (Reschly et al., 2002). For example, what are statistically significant differences? What is a meaningful difference? What appropriate cognitive abilities should serve as proxies of general intelligence when the global IQ is questioned? What should be the magnitude of the total test score?

Appropriate cognitive abilities will only be the only issue discussed here. This issue addresses which component or part scores are more correlated with general intelligence (g)—that is, what component part scores are high g-loaders? The traditional consensus has been that measures of Gc (crystallized intelligence; comprehension-knowledge) and Gf (fluid intelligence or reasoning) are the highest g-loading measures and constructs and are the most likely candidates for elevated status when diagnosing ID (Reschly et al., 2002). Although not always stated explicitly, the third method consistency SLD definitions specify that an individual must demonstrate “at least an average level of general cognitive ability or intelligence” (Flanagan et al., 2010, p.745), a statement that implicitly suggests cognitive abilities and component scores with high g-ness.

Table 1 is intended to provide guidance when using component part scores in the diagnosis and classification of SLD and ID (click on images to enlarge and use the browser zoom feature to view; it is recommended you click here to access a PDF copy of the table..and also zoom in on it). Table 1 presents a summary of the comprehensive, nationally normed, individually administered intelligence batteries that possess satisfactory psychometric characteristics (i.e., national norm samples, adequate reliability and validity for the composite g-score) for use in the diagnosis of ID and SLD.

The “Composite g-score” column lists the global general intelligence score provided by each intelligence battery. This score is the best estimate of a person’s general intellectual ability, which currently is most relevant to the diagnosis of ID as per AAIDD. All composite g-scores listed in Table 1 meet Jensen’s (1998) psychometric sampling error criteria as valid estimates of general intelligence. As per Jensen’s number of tests criterion, all intelligence batteries g-composites are based on a minimum of nine tests that sample at least three primary cognitive ability domains. As per Jensen’s variety of tests criterion (i.e., information content, skills and demands for a variety of mental operations), the batteries, when viewed from the perspective of CHC theory, vary in ability domain coverage—four (CAS, SB5), five (KABC-II, WISC-IV, WAIS-IV), six (DAS-II) and seven (WJ III) (Flanagan, Ortiz & Alfonso, 2007; Keith & Reynolds, 2010). As recommended by Jensen (1998), “the particular collection of tests used to estimate g should come as close as possible, with some limited number of tests, to being a representative sample of all types of mental tests, and the various kinds of test should be represented as equally as possible” (p. 85). Users should consult sources such as Flanagan et al. (2007) and Keith and Reynolds, 2010) to determine how each intelligence battery approximates Jensen’s optimal design criterion, the specific CHC domains measured, and the proportional representation of the CHC domains in each batteries composite g-score.

Also included in Table 1 are the component part scales provided by each battery (e.g., WAIS-IV Verbal Comprehension Index, Perceptual Reasoning Index, Working Memory Index, and Processing Speed Index), followed by their respective within-battery g-loadings.[1] Examination of the g-ness of composite scores from existing batteries (see last three columns in Table 1) suggests the traditional assumption that measures of Gf and Gc are the best proxies of general intelligence may not hold across all intelligence batteries.[2]

In the case of the SB5, all five composite part scores are very similar in g-loadings (h² = .72 to .79). No single SB5 composite part score appears better than the other SB5 scores for suggesting average general intelligence (when the global IQ score is not used for this purpose). At the other extreme is the WJ III where the Fluid Reasoning, Comprehension-Knowledge, Long-term Storage and Retrieval cluster scores are the best g-proxies for part-score based interpretation within the WJ III. The WJ III Visual Processing and Processing Speed clusters are not composite part scores that should be emphasized as indicators of general intelligence. Across all batteries that include a processing speed component part score (DAS-II, WAIS-IV, WISC-IV, WJ III) the respective processing speed scale is always the weakest proxy for general intelligence and thus, would not be viewed as a good estimate of general intelligence.

It is also clear that one cannot assume that composites with similar sounding names of measured abilities should have similar relative g-ness status within different batteries. For example, the Gv (visual-spatial or visual processing) clusters in the DAS-II (Spatial Ability), SB5 (Visual-Spatial Processing) are relatively strong g-measures within their respective battery, but the same cannot be said for the WJ III Visual Processing cluster. Even more interesting are the differences in the WAIS-IV and WISC-IV relative g-loadings for similarly sounding index scores.

For example, the Working Memory Index is the highest g-loading component part score (tied with Perceptual Reasoning Index) in the WAIS-IV but is only third (out of four) in the WISC-IV. The Working Memory Index is comprised of the Digit Span and Arithmetic subtests in the WAIS-IV and the Digit Span and the Letter-Number Sequencing subtests in the WISC-IV. The Arithmetic subtest has been reported to be a factorially complex test which may tap fluid intelligence (Gf-RQ—quantitative reasoning), quantitative knowledge (Gq), working memory (Gsm), and possible processing speed (Gs; Keith & Reynolds, 2010; Phelps, McGrew, Knopik & Ford, 2005). The factorially complex characteristics of the Arithmetic subtest (which, in essence, makes it function like a mini-g proxy) would explain why the WAIS-IV Working Memory Index is a good proxy for g in the WAIS-IV but not in the WISC-IV. The WAIS-IV and WISC-IV Working Memory Index scales, although named the same, are not measuring identical constructs.

A critical caveat is that the g-loadings cannot be compared across different batteries. g-loadings may change when the mixture of measures included in the analyses change. Different "flavors" of g can result (Carroll, 1993; Jensen, 1998). The only way to compare the g-ness across batteries is with appropriately designed cross- or joint-battery analysis (e.g., WAIS-IV, SB5 and WJ III analyzed in a common sample).

The above within and across intelligence battery examples illustrates that those who use component part scores as an estimate of a person’s general intelligence must be aware of the composition and psychometric g-ness of the component scores within each intelligence battery. Not all component part scores in different intelligence batteries are created equal (with regard to g-ness). Also, not all similarly named factor-based composite scores may measure the same identical construct and may vary in degree of within battery g-ness. This is not a new problem in the context of naming factors in factor analysis, and by extension, factor-based intelligence test composite scores, Cliff (1983) described this nominalistic fallacy in simple language—“if we name something, this does not mean we understand it” (p. 120).

[1] As noted in the footnotes in Table 1, all composite score g-loadings were computed by Kevin McGrew by entering the smallest number (and largest age ranges covered) of the published correlation matrices within each intelligence batteries technical manual (note the exception for the WJ III) in order to obtain an average g-loading estimate. It would have been possible to calculate and report these values for each age-differentiated correlation matrix for each intelligence battery. However, the purpose of this table is to provide the best possible average value across the entire age-range of each intelligence battery. Floyd and colleagues have published age-differentiated g-loadings for the DAS-II and WJ III. Those values were not used as they are based on the use of the principal common factor analysis method, a method that analyzes the reliable shared variance among tests. Although principal factor and principal component loadings typically will order measures in the same relative position, the principal factor loadings typically will be lower. Given that the imperfect manifest composite scale scores are those that are utilized in practice, and to also allow uniformity in the calculation of the g-loadings reported in Table 1, principal component analysis was used in this work. The same rationale was used for not using the latent factor loadings on a higher-order g-factor in SEM/CFA analysis of each test battery. Loadings from CFA analyses represent the relations between the underlying theoretical ability constructs and g purged of measurement error. Also, frequently the final CFA solutions reported in a batteries technical manual (or independent journal articles) allow tests to be factorially complex (load on more than one latent factor), a measurement model that does not resemble the real world reality of the manifest/observed composite scores used in practice. Latent factor loadings on a higher-order g-factor will often differ significantly from principal component loadings based on the manifest measures, both in absolute magnitude and relative size (e.g., see high Ga loading on g in WJ III technical manual which is at variance with the manifest variable based Ga loading reported in Table 1)

[2] The h²values are the values that should be used to compare the relative amount of g-variance present in the component part scores within each intelligence battery.

Wednesday, February 08, 2012

Research byte: Relation between cognitive and achievement g across the lifespan

Double click on images to enlarge.

I was honored to be included in this group of esteemed intelligence scholars research project. [Conflict of interest - I am a coauthor of the WJ III that was used in this study]

- Posted using BlogPress from Kevin McGrew's iPad

Thursday, December 02, 2010

IQ test battery publication timeline: Atkins MR/ID Flynn Effect cheat sheet

As I've become involved in consulting on Atkins MR/ID death penalty cases, a frequent topic raised is that of norm obsolescence (aka, the Flynn Effect). When talking with others I often have trouble spitting out the exact date of publication of the various revisions of tests, as I keep track of more than just the Wechsler batteries (which are the primary IQ tests in Atkins reports). I often wonder if others question my expertise...but most don't realize that there are more IQ batteries out there than just the Wechsler adult battery....and, in particular, a large number of child normed batteries and other batteries spanning childhood and adulthood. Thus, I decided to put together a cheat sheet for myself..one that I could print and have in my files. I put it together in the form of a simple IQ battery publication timeline. Below is an image of the figure. Double click on it to enlarge.

An important point to understand is that when serious discussions start focusing on the Flynn effect in trial's, most often the test publication date is NOT used in the calculation of how obsolete a set of test norms are. Instead, the best estimate of the year the test was normed/standardized is used, which is not included in this figure (you will need to locate this information). For example, the WAIS-R was published in 1981...but the manual states that the norming occurred from May 1976 to May 1980. Thus, in most Flynn effect discussions in court cases, the date of 1978 (middle of the norming period) is typically used. This makes recall of this information difficult for experts who track all the major individually administered IQ batteries.

Hope this helpful...if nothing else...you must admit that it is pretty :) Click on image to view

- iPost using BlogPress from my Kevin McGrew's iPad

intelligence intelligence testing Atkins cases ICDP blog psychology school psychology neuropsychology Forensic psychology criminal psychology criminal justice death penalty capital punishment ABA IQ tests IQ scores adaptive behavior AAIDD mental retardation intellectual disability Flynn effect

Tuesday, January 05, 2010

The Wechsler-like IQ subtest scaled score metric: The potential for misuse, misinterpretation and impact on critical life decisions---draft report in search of feedback

The following are the first three paragraphs (and a critical figure) of a draft of an IAP Applied Psychometrics 101 Brief Report (#5). The complete report can be download in PDF format by clicking here. A web-page version of the complete report can be found by clicking here (note - the web page verision may NOT display two embedded figures....viewing the PDF copy may be necessary)

I'm providing this initial draft report with the expressed intent of soliciting feedback and comments regarding the accuracy and soundness of my analyses and logic. I'm looking for critical feedback to improve the report. This is a draft report that will be revised if comments suggest important changes. Please read it in the spirit of "tossing out some critical ideas" for reflective analysis and feedback. Feedback can be sent directly to me (iap@earthlink.net) or could be provided in the form of listserv thread discussions at the NASP and/or CHC listservs.

I've recently been skimming James Flynn's new book (What is Intelligence: Beyond the Flynn Effect) to better understand the methodology and interpretation of the Flynn effect. Of particular interest to me (as an applied measurement person) is his analysis of the individual subtest scores from the various Wechsler scales across time. As most psychologists know, Wechsler subtest scaled scores (ss) are on a scale with a mean (M) = 10 and a standard deviation (SD) = 3. The subtest ss range from 1 to 19. In Appendix 1 of his book, Flynn states "it is customary to score subtests on a scale in which the SD is 3, as opposed to IQ scores which are scaled with SD set at 15. To convert to IQ, just multiply subtest gains by five, as was done to get the IQ gains in the last column." At first glance, this statement makes it sound as if the transformation of subtest ss to IQ SS is an easy (“just multiply….”; emphasis added by me) and mathematically acceptable procedure without problems. However, on close inspection this transformation has the potential to introduce unknown sources of error into the precision of the transformed SS scores. It is the goal of this brief technical post to explain the issues involved when making this ss-to- IQ SS conversion.

The ss 1-19 scale has a long history in the Wechsler batteries. For sample, in Appendix 1 of Measurement of Adult Intelligence (Wechsler, 1944), Wechsler described the steps used to translate subtest raw scores to the new ss metric. The Wechsler batteries have continued this tradition in each new revision, although the methodology and procedures to calculate the ss 1-19 values have become more sophisticated over time. Although the methods used to develop the Wechsler ss 1-19 scale may have become more sophisticated, the resultant underlying scale for each subtest has not…scores still range from 1-19 (M=10; SD=3). Also, the most recent Stanford-Binet—5th Edition (SB5; Roid, 2003) and Kaufman Assessment Battery for Children-2nd Edition (KABC-II) have both adopted the same ss 1-19 scale for their respective individual subtests.

Why is this relatively crude (to be defined below) scale metric still used in some intelligence batteries when other contemporary intelligence batteries provide subtest scale metrics with finer measurement resolution? For example, the DAS-II (Elliott, 2007) places individual test scores on the T-scale (M=50; SD=10), with scores that range from 10-90. The WJ III (McGrew & Woodcock, 2001) places all test and composite scores on the standard score (SS) metric associated with full scale and composite scores (M=100; SD=15). The critical question to be asked is “are there advantages or disadvantages to retaining the historical ss 1-19 scale or, are their real advantages to having individual test scales with finer measurement resolution (DAS-II; WJ III)?”

......continued............
(complete report available at links in first paragraph of this post)

[Double click on image to enlarge]

Technorati Tags: psychology, Flynn Effect,school psychology, educational psychology, neuropsychology, forensice psychology, criminal psychology, criminal justice, psychometrics, Wechslers, WISC-III, WISC-IV, WAIS-III, WAIS-IV, WJ III, DAS-II, KABC-II, SB5, intelligence, IQ tests, IQ scores, scaled scores, MR, ID, mental retardation, intellectual disability, capital punishment, death penalty, IAP AP101 report

Monday, December 28, 2009

Dssertation Dish: Woodcock -- Johnson and KABC-profile research

Validation of neuropsychological subtypes of learning disabilities by Hiller, Todd R., Ph.D., Ball State University, 2009 , 99 pages; AAT 3379243

Abstract

The present study used archival data of individuals given the Woodcock-Johnson Tests of Cognitive Abilities 3 rd Edition and the Woodcock-Johnson Tests of Achievement 3 rd Edition in an effort to define subtypes of LD. The sample included 526 subjects aged 6 years to 18 years old who had a diagnosis of some type of LD. Of these, 22.7% had an additional diagnosis other than LD. It was expected that subtypes similar to Rourke's classification of his nonverbal learning disorder and his basic phonological processing disorder would be found.

Portions of the battery were used in a latent class cluster analysis in order to determine group patterns of strengths and weaknesses. Using the Lo-Mendell-Rubin test, a 3 solution model was selected. These three groups showed no evidence of patterns of strengths and weaknesses. These groups were best described as a high, middle, and low group, in that the high group had scores that were universally larger than scores from the middle group, which had scores that were universally higher than the low group. The rates of individuals with comorbid disorders varied greatly between the clusters. The high group had the lowest comorbidity rates in the study, with only 6.8%. That is compared to 26.4% of the middle group and 44.8% of the low group.

These results suggest that clusters found differ more in severity rather than types of LD. Individuals with LD and comorbid disorders are more likely to have more severe deficits.

Profile analysis of the Kaufman Assessment Battery for Children, Second Edition with African American and Caucasian preschool children by Dale, Brittany Ann, Ph.D., Ball State University, 2009 , 130 pages; AAT 3379238

Abstract

The purpose of the present study was to determine if African American and Caucasian preschool children displayed similar patterns of performance among the Cattell-Horn-Carroll (CHC) factors measured by the Kaufman Assessment Battery for Children, Second Edition (KABC-II). Specifically, a profile analysis was conducted to determine if African Americans and Caucasians displayed the same patterns of highs and low and scored at the same level on the KABC-II composites and subtests. Forty-nine African American (mean age = 59.14 months) and 49 Caucasian (mean age = 59.39) preschool children from a Midwestern City were included in the study and were matched on age, sex, and level of parental education. Results of a profile analysis found African American and Caucasian preschool children had a similar pattern of highs and lows and performed at the same level on the CHC broad abilities as measured by the KABC-II. Comparison of the overall mean IQ indicated no significant differences between the two groups. The overall mean difference between groups was 1.47 points, the smallest gap seen in the literature. This finding was inconsistent with previous research indicating a one standard deviation difference in IQ between African Americans and Caucasians. A profile analysis of the KABC-II subtests found the African American and Caucasian groups performed at an overall similar level, but did not show the same pattern of highs and lows. Specifically, Caucasians scored significantly higher than African Americans on the Expressive Vocabulary subtest which measures the CHC narrow ability of Lexical Knowledge.

Results of this study supported the KABC-II's authors' recommendation to make interpretations at the composite level. When developing hypotheses of an individual's strengths and weaknesses in narrow abilities, clinicians should be cautious when interpreting the Expressive Vocabulary subtest with African Americans. Overall, results of this study supported the use of the KABC-II with African American preschool children. When making assessment decisions, clinicians can be more confident in an unbiased assessment with the KABC-II.

Future research could further explore the CHC narrow abilities in ethnically diverse populations. Additionally, more research should be conducted with other measures of cognitive ability designed to adhere to the CHC theory, and the appropriateness of those tests with an African American population. Furthermore, future research with the KABC-II could determine if the results of the present study were replicated in other age groups.

Technorati Tags: psychology, educational psychology, school psychology, neuropsychology, special education, IQ, IQ test, CHC, Cattlll -Horn-Caroll, intelligene, disseration dish, IQs Corner, profiles

Tuesday, November 17, 2009

AP101 Brief #2: IQ test descriptive comparison information

AP101 # 2 Brief: IQ test descriptive comparison information has been posted to sister blog ICDP.

Monday, November 02, 2009

Dissertation dish: WJ III/KABC-II Gv and ach AND Gs, RAN, Ga, work mem and reading

Visual-spatial thinking and academic achievement: A concurrent and predictive validity study by Yazzie, Anslem, Ph.D., Northern Arizona University, 2009 , 92 pages; AAT 3370650

Abstract

Forty-eight students were administered the Spatial Relations and Picture Recognition subtests of Woodcock-Johnson III Visual-Spatial Thinking Cluster, Rover and Triangles subtests of the Kaufman Assessment Battery for Children-Second Edition Visual Processing Cluster, and Word Reading, Math Computation, and Spelling subtests of the Wide Range Achievement Test-Four Edition. According to previous research, several assumptions regarding visual-spatial thinking's correlation with achievement, concurrent validity with other measures, and differences in gender and ethnicity had been found to be variant. Mean differences were compared for significance of performance on overall clusters. Examination of cluster performances indicated that visual-spatial thinking ( Gv ) was equally measured on both cognitive measures. There was a notable relationship between cognitive Gv performances and achievement. When mean differences were made in terms of ethnicity, no statistical difference was found. In contrast, a significant difference was found when gender was examined. Results are discussed in terms of implications for school psychologists, researchers, and teachers.

The relationships among cognitive ability measures and irregular word, non-word, and word reading by Abu-Hamour, Bashir, Ph.D., The University of Arizona, 2009 , 160 pages; AAT 3369670

Abstract

This study examined the relationships between and among: (a) Processing Speed (PS) Cluster and Rapid Automatized Naming (RAN) Total to reading ability; (b) measures of RAN and PS to irregular word, non-word, and word reading; and (c) the relationships among irregular word, non-word, and word reading. The word reading measures were predicted by using multiple cognitive abilities including Phonological Awareness (PA), RAN, PS, and Working Memory (WM). Sixty participants, 39 students who were average readers and 21 students with reading difficulties in Grades 1, 2, 3, and 4 were recruited.

Correlational designs testing predictive relationships were used to conduct this study. The results indicated that the PS Cluster had the strongest correlation with irregular word reading, whereas the RAN Total had the strongest correlation with both word reading and non-word reading ability. Reading performance was best predicted by RAN-Letters. In addition, the Woodcock-Johnson III Visual Matching test had the strongest predictive power of reading ability among all of the PS measures.

High correlations were found among the reading variables within normally distributed data, whereas there was no significant correlation between irregular and nonword reading within the group of students with Reading Difficulties. These findings provide support for the dual-route theory. Among the 21 students with RD, 10 students presented problems in both non-word reading and irregular word reading; 9 students presented problems just in non-word reading; and 2 students presented problems just in irregular word reading.

A model consisting of RAN, PA, and PS, as included in the study measures, provided the most powerful prediction of all reading skills. These findings also lend more support to the double-deficit model and indicate that PA and naming speed problems contribute independently to variance in reading.

This study provides direction for the assessment of specific reading disability and the cognitive underpinnings of this disorder. These findings support the need to assess PA, RAN, and PS, as well as various types of word reading skills, when making a reading disability diagnosis. Further research may cross validate the results of this study, or add other aspects of reading (eg., reading fluency or comprehension) to this line of research.

Technorati Tags: psychology, education, special education, learning disabilities, school psychology, educational psychology, learning, reading, math, visual spatial, working memory, RAN, rapid automatic naming, processing speed, Gs, Gsm, Ga, Glr, reading, math, dyslexia, dyscalculia, IQs Corner

Friday, May 15, 2009

CHC theory: Emergence, test instruments and school-related research brief

Contemporary Cattell-Horn-Carroll (CHC) intelligence test development, interpretation and applied research can be traced to a fortuitous meeting of Richard Woodcock, John Horn, and John “Jack” Carroll in the fall of 1985, a meeting also attended by the first author of this web-resource ( McGrew, 2005). This meeting resulted in the 1989 publication of the first individually-administered, nationally standardized CHC-based intelligence battery, the Woodcock- Johnson- Revised (Woodcock, McGrew, & Mather, 1989). This landmark event, which occurred 20 years ago, provided the impetus for the major CHC- driven evolution of school- based intelligence testing practice.

Subsequent important CHC events followed during this 20 year period, and included: (a) the first set of CHC- organized joint test battery factor analysis studies (Woodcock, 1990) which planted the seeds for the concept of CHC cross-battery (CB) assessment, (b) the first attempt to use the WJ-R, via a Kaufman-like supplemental testing strategy (Kaufman, 1979), to implement the yet to be named and operationalized CHC CB approach to testing ( McGrew, 1993), (c) the articulation of the first integrated Cattell-Horn-Carroll model and classification of the major intelligence batteries as per the CHC framework (McGrew, 1997), (d) the first description of the assumptions, foundations, and operational principles for CHC CB assessment and interpretation (Flanagan & McGrew, 1997; McGrew & Flanagan, 1998), (e) the publication of the first intelligence theory and assessment book to prominently feature CHC theory and assessment methods (Contemporary Intellectual Assessment: Theories, Tests, and Issues; Flanagan, Genshaft & Harrison, 1997; click here for link to 2nd edition), (f) the publication of the CHC CB assessment series ( Flanagan, McGrew & Ortiz, 2000; Flanagan, Ortiz, Alfonso & Mascolo, 2006; Flanagan, Ortiz & Mascolo, 2001, 2007; McGrew & Flanagan, 1998), (g) the completion of a series of CHC-organized studies that investigated the relations between CHC cognitive abilities and reading, math, and writing achievement (what you are reading now), (h) the articulation of CHC-grounded SLD assessment and eligibility frameworks (see Flanagan & Fiorello, manuscript in preparation) and (h) the subsequent CHC- grounded revisions or interpretations of a number of comprehensive individually administered intelligence test batteries ( Differential Abilities Scales—II, DAS-II;Stanford- Binet—5^thEdition, SB5;Kaufman Assessment Battery for Children—2^ndEdition, KABC- II). Although not overtly stated, the impact of CHC theory can be seen in the recent revisions of the venerable Wechsler trilogy ( WPPSI-III; WISC- IV; WAIS- IV) as well as the presentation of CHC CB procedures for interpreting the three Wechsler batteries ( Flanagan et al., 2000).

Click here for other posts in this series.

Tuesday, March 10, 2009

Dissertation dish: KABC II and role of English langage proficiency

The performance evaluation of low achieving Mexican American students on the Kaufman Assessment Battery for Children II (KABC-II): The role of English language proficiency by Gomez, Micaela T., Ph.D., Capella University, 2008, 121 pages; AAT 3339017

Abstract: This study investigated the relationship of English language proficiency and IQ scores of low achieving Mexican American students at ages between 7 and 12 whose native language was not English. The research was designed to determine if IQ differences would be found between males and females and if a correlation exists between language proficiency and IQ scores and academic scores, respectively. It was also designed to determine which variables were statistically significant in a model utilizing gender, age, English language proficiency level, and IQ scores to predict academic achievement. Predictive models differed in significant variables found by gender. Criterion sampling was utilized to determine participation ( N = 137). The students had previously been administered the Kaufman Assessment Battery for Children, Second Edition (KABC-II) for IQ assessment, the California English Language Development Test for language competency assessment, and the Woodcock-Johnson Tests of Achievement, Third Edition (WJ-III ACH) for academic achievement assessment. The results indicated a significant difference of IQ scores between males and females within this age range (7-12). Correlational analysis indicated that English language proficiency did have a significant relationship with IQ scores. ANOVA found significant differences of IQ among the five levels of the CELDT and age did not seem to have an impact on this relationship. Furthermore, there was a significant relationship between IQ and academic achievement, with the strongest relationship in Mathematics and Writing skills. Finally, the regression equations that emerged from the analyses differed by gender with variable subcategories of the CELDT. The results provided the understanding of the relationships among IQ, English language proficiency, and academic achievement in this special population. The study also recommends English language assessment strategies for low achieving Mexican American elementary students.

Technorati Tags: psychology, educational psychology, school psychology, special education, KABC II, Kaufman Assessment Battery for Children, ELP, English language proficiency, achievement, education, CHC, Cattell-Horn-Carroll, neuropsychology

Wednesday, February 27, 2008

Dissertation Dish: WJ III K-ABC preschool CHC cross-battery factor study

Yet another WJ III CHC-organized dissertation has found its way to IQ's Corner (see Disseratation Dish index for others). This dissertation is a cross-battery confirmatory factor analysis of the K-ABC and WJ III in a preschool sample. The abstract is below.

A joint confirmatory factor analysis of the Kaufman Assessment Battery for Children, Second Edition, and the Woodcock-Johnson Tests of Cognitive Abilities, Third Edition, with preschool children by Hunt, Madeline S., Ph.D., Ball State University, 2007, 238 pages; AAT 3288307

Abstract

The purpose of this study was to explore the construct validity of the Kaufman Assessment Battery for Children, Second Edition (KABC-II; Kaufman & Kaufman, 2004a) and the Woodcock-Johnson Tests of Cognitive Abilities, Third Edition (WJ-III COG; Woodcock, McGrew, & Mather, 2001) with a sample of 200 preschool children, ranging in age from 4 years, 0 months to 5 years, 11 months, and attending preschool and daycare programs in and around a Midwestern city. This study attempted to determine if the Cattell-Horn-Carroll (CHC) factor structure represented on these tests can be identified with young children. Individual confirmatory factor analyses were conducted separately with the KABC-II and WJ-III COG. Moreover, a joint confirmatory factor analysis was conducted using both the KABC-II and WJ-III COG. The results of the individual KABC-II factor analyses indicated a two-tiered Gf-Gc model provided the best fit to the data, although the three-tiered CHC model also fit the data well. This suggests the underlying factor structure of the KABC-II is well represented by the CHC theory. The WJ-III COG was best represented by an alternative CHC model, in which the Gf factor and subtests had been removed, indicating not all CHC constructs represented on the WJ-III COG can be reliably identified among young children. The joint confirmatory factor analysis indicated the strongest measures of the shared CHC factors on the KABC-II and WJ-III COG, which can help to guide cross-battery assessment with preschool children. Overall, the results confirmed multiple CHC abilities can be assessed with young children, implying clinicians should be using preschool tests that provide scores for several cognitive abilities. This study also revealed the constructs of the CHC theory may be represented somewhat differently on preschool tests due to developmental influences. Strong correlations were evident between unrelated tasks, primarily because the verbal and linguistic demands of many subtests caused them to load unexpectedly on the Gc factor. Suggestions for future research include conducting the same study using preschool children with suspected disabilities, as well as with older children, examining other instruments that include a Gf factor, and conducting exploratory factor analysis with subtests from the KABC-II and WJ-III COG that contain significant components of more than one ability.

Technorati Tags: psychology, educational psychology, school psychology, neuropsychology, K-ABC, WJ III, Woodcock-Johnson, CHC, CHC theory, cross-battery, preschool, intelligence, IQ tests, IQ scores, Ball State University

Friday, February 15, 2008

CHC interpretation of the KABC-II: Guest post by John Garruto

Reynolds, M.R., Keith, T.Z., Goldenring-Fine, J., Fisher, M.E. & Low, J.A. (2007). Confirmatory Factor Structure of the Kaufman Assessment Battery for Children—Second Edition: Consistency With Cattell-Horn-Carroll Theory. School Psychology Quarterly, 22(4), 511-539. [click here to view article]

Abstract:

The Kaufman Assessment Battery for Children-Second Edition (KABC-II) is a departure from the original KABC in that it allows for interpretation via two theoretical models of intelligence. This study had two purposes: to determine whether the KABC-II measures the same constructs across ages and to investigate whether those constructs are consistent with Cattell-Horn-Carroll (CHC) theory. Multiple-sample analyses were used to test for equality of the variancecovariance matrices across the 3- to 18-year-old sample. Higher-order confirmatory factor analyses were used to compare the KABC-II model with rival CHCmodels for children ages 6 to 18. Results show that the KABC-II measures the same constructs across all ages. The KABC-II factor structure for school-age children is aligned closely with five broad abilities from CHC theory, although some inconsistencies were found. Models without time bonuses fit better than those with time bonuses. The results provide support for the construct validity of the KABC-II. Additional research is needed to more completely understand the measurement of fluid reasoning and the role of time bonuses on some tasks.

Okay, I have to tie in the Super Bowl somewhere because the New York Giants won and I waited seventeen years for this to happen again. Like the Super Bowl, there are many practitioners who are interested in the end results (the score at the end of the game), not necessarily how one got there (the whole study) or the analysis of each play (the statistics). For the ease of readers, I’m going to jump to the score at the end of the game.

The Results: The K-ABC II is emerging as a serious contender among cognitive assessment batteries. I also want to say that from reading his posts on the CHC listserv, and now this article, I’m expecting to see some more good stuff from Matthew Reynolds (I’ve always been a fan of Tim Keith’s research and really like his stuff on the WJ-III).

This study seeks to analyze the K-ABC II from a CHC perspective, which is one of two theories the test is built upon (the other is the Luria-Das perspective where the original has its origins.) It’s worth mentioning that Kaufman is not new to CHC theory. The Kaufman Adolescent and Adult Intelligence Test (KAIT) also used Gf-Gc as its basis. The current study is an internal validity study using factor analysis methods. Several analysis were performed to determine best model fit with certain manipulations of the analysis.

First it bears mentioning that the test g (general intelligence) loadings are consistent with prior research (Gf and Gc being high g loaders-Gv as well interestingly!) Reynolds et al. sought to answer some interesting hypotheses regarding model fit as well as cross-factor test loadings. Here are some questions posed and answers provided:

Does Gestalt Closure measure Gc? The test requires subjects to look at “inkblots” (Gv) that resemble familiar objects (Gc). It was concluded that there was a Gc load on this subtest. My own thoughts…it might be neat to show the child a list of objects that represented the stimuli after the assessment is complete. From there, one could rule in or rule out Gc contamination. Nevertheless, the Gc load is important because Gv is often thought to be (or supposed to be) an area where less “acquired cultural knowledge” should impact performance.

Do Hand Movements measure Gf? This subtest measures a pantomime of repeated hand movements and is purported to load on Gsm. The authors note a relationship to Gf. The hypothesis was generated relating to strategy for success and working memory. My own thoughts…why isn’t anyone talking Gv? Sure this test requires motor planning (frontal activiation?), but I argue that success can result from remembering a visual sequence. Furthermore, Gsm has often been related to verbal prompt/auditory modality. Although the intertwining of working memory and fluid reasoning has been discussed..I’m not sure I see a huge component of either. The task appears very sequential to me. The visualization component is too hard to ignore. Given the lower load on Gsm I would be interested in looking at a Gv link.

Does Pattern Reasoning measure Gv? The analysis suggested loadings on Gv as well as Gf. I have lately found this link to be of interest. It seems pure measures of Gf are hard to find. Sometimes the comparison on the WISC-IV of Picture Concepts (Gf-I) and Matrix Reasoning (Gf-I) is interesting given that I see a major discrepancy in scores. Indeed the former requires more Gc and the latter more Gv-especiall if transformation of the stimuli is required in order to logically complete the puzzles. This becomes even more important if g=Gf...as some scholars have suggested. Under what presentation conditions then is Gf (g?) more successful? Could the learning modality trend be returning (just kidding-not touching that one!)

Does Story Completion measure Gc or Gv? The analysis suggested that the answer is "no." Story Completion appears to be a measure of Gf. It’s interesting because I remember reading that a similar test on the WISC-III (Picture Arrangement) had similar loadings on VCI as POI. I might have thought that there would be more of a Gc load on Story Completion than on Gestalt Closure.....but then Gc also requires verbal recall of names whereas this requires logical sequencing ability. I imagine there’s probably some Gc necessary, but not enough that having a lot of it will predict success (or too little will predict failure).

Do Rover and Block Counting measure Gf as well as Gv? The analysis suggested…definitely. Gv for Block Counting (which I would intuitively agree with) and the jury is still out for Rover. The deductive reasoning element with Rover is certainly apparent...but I think it’s important not to forget that Rover has some executive function elements to it (it’s not very much unlike Planning on the WJ-III). Right now though t seems Gv is present for Block Counting and Rover.

The Issue of Time bonuses: This research question was very important to me. I recall giving this battery to someone and finding out that the difference on one of the tests (whether timed bonuses were provided or not) resulted in a scaled score difference of almost two standard deviations! I followed the manual, which endorses using timed points unless there’s a reason not to. However, Reynolds et al. found a better model fit for no time bonus. This is not bad news. Sometimes we learn different practices after a test has been normed and published. I remember Kaufman’s book on the WISC-III where he indicated Symbol Search to be a higher ‘g’ loader than Coding and the informed practitioner may wish to have SS substitute for Coding as long as the decision was made in an apriori fashion. I guess for me, I do not want Gs contaminating a different factor I’m attempting to measure. I prefer to measure it far away from ‘g’ via a cross battery technique given that Gs has shown weaker relationships to ‘g’ but significant relationships to learning disabilities. Sometimes we learn new ways to practice as a result of follow-up research-this certainly fits that mold.

Overall Conclusion: I think K-ABC II is going somewhere. It is receiving some interesting recognition by some scholars and even is purported to have some utility with nonverbal, non English speaking, and/or autistic spectrum populations. Given the potential of this instrument in cognitive assessment, the research opportunities are certainly plentiful. I still see the Wechsler as my test of choice for Gc, the WJ-III as my favorite test to fill in the holes left by many cognitive batteries...but there certainly seems to be significant practical implications for the K-ABC II. Certainly the relationships to CHC theory are again very much substantiated. Certainly there are plenty of Patriots who want to view assessment from a traditional framework. Also, like the Patriots, there are those who are jumping on the loudest flavor of the month (RTI as being the only way to diagnose learning disabilities). However, the Reynolds et al. study continues to show that CHC theory has stood solid time and again as one of the "Giant" individual differences frameworks for use by school psychologists.

Technorati Tags: psychology, educational psychology, school psychology, intelligence, IQ, IQ tests, KABC-II, CHC, CHC theory

Wednesday, November 28, 2007

Beyond the CHC theory tipping point: Back to the future

I just posted a copy of the PPT slides that served as the first half of two presentations I recently made in Canada re: the CHC (Cattell-Horn-Carroll) theory of intelligence. The latest version is called "Beyond the CHC theory tipping point: Back to the future."

The slide show can be viewed by scrolling down the left-side of this blog page until you reach the "On-line PPT slide" section header. Click on the presentation title and enjoy.

Below is a brief description of the slide show:

An overview of the CHC (Cattell-Horn-Carroll) theory of intelligence within a historical and "waves of interpretation" context. Presents idea that CHC has reached the "tipping point" in school psychology..and...this is allowing assessment practitioners to realize past attempts to engage in individual strength and weakness interpretation of CHC based test profiles

Technorati Tags: psychology, education, educational psychology, school psychology, neuropsychology, intelligence, IQ, IQ score, IQ test, intelligence quotient, CHC, CHC theory, Cattell-Horn-Carroll, tipping point, back to the future, learning

Friday, October 27, 2006

RTI and cognitive assessment--Guest post by John Garruto

The following is a guest post by John Garruto, school psychologist with the Oswego School District and member of the IQs Corner Virtual Community of Scholars. John reviewed the following article and has provided his comments below. [Blog dictator note - John's review is presented "as is" with only a few minor copy edits and the insertion of some URL links]

Hale, J.B., Kaufman, A., Naglieri, J.A. & Kavale, K.A. (2006). Implementation Of IDEA: Integrating Response To Intervention And Cognitive Assessment Methods. Psychology in the Schools, 43(7), 753-770. (click here to view)

This article (and the entire journal series in this special issue) has articulated much of what I have been saying and thinking for a long time. Hale and colleagues open up by discussing the RTI (response-to-intervention) and cognitive assessment “factions”. Although I had nothing to do with this article, I chuckled at the similarity to a PowerPoint I did for graduate study in July of 2005 (click here). I joked about these factions as having a paradigm that was analogous to “Star Wars”. I likened the idea of school psychologists who espoused both RTI and cognitive assessment as necessary requirements for the identification of SLD (Specific Learning Disability) as comprising “a rebel alliance”…primarily because it seemed we were advocating such a balanced approach. Clearly this Psychology in the Schools special issue suggests there is an increasing number of professionals who advocate this approach.

Before beginning with a general summary and sharing my overall impressions, it is important to acknowledge the obvious conflict of interest of most dissenters (in the special issue); both Kaufman (KABC-II) and Naglieri (CAS) are intelligence test authors. That said, it is important to note that two of the other authors are not test authors. In fact, Kavale (a.k.a., the intervention effect size guru) is frequently cited by many RTI-only proponents. Therefore, it is suggested that the scope of this article ending at a conflict of interest is very unlikely.

The Hale et al. article begins with the acknowledgment that there seem to be two factions in school psychology assessment circles--those who believe in response-to-intervention as the way to determine eligibility for SLD, and those who espouse the need for cognitive assessment. The Hale et al. article does not diminish the importance of RTI or the problem-solving model. In fact, it supports many of the changes noted in the regulations (e.g., the importance of looking at RTI as a part of the process for determining eligibility for learning disabilities.) It places emphasis on the use of empirically-based instruction and interventions. It also highlights the significance of formative assessment and ongoing progress monitoring. Such practices will illustrate the effect of interventions.

After supporting the importance of RTI, the authors contend that at Tier-III, a responsible individualized assessment (including cognitive assessment) needs to occur. Clearly, jumping to conclusions about a neurologically-based deficit based only on failure to RTI would lead to a significant number of false positives (Type I errors). The authors do an exemplary job of identifying the importance of cognitive processing deficits related to SLD in the problem-solving literature. This approach does not embrace the much maligned ability-achievement discrepancy LD identification procedure, but instead endorses examining those that processes are leading (if any) to the negative outcomes. The authors conclude with a case study that describes a child who seemed to have one problem on the surface, but via cognitive assessment was discovered to have an underlying latent problem (i.e., was not observably manifest.) The authors contended that this discovery, vis-à-vis appropriately designed cognitive assessment methods, facilitated the problem-solving model by allowing the team to implement new interventions. The beauty of this example is that the focus was not on eligibility as the end result, but instead, using individualized assessment to help piece the puzzle together.

I’ve spoken quite a bit about the authors and a possible conflict of interest. One thing I do want to mention is that I continue to be a school-based practitioner. This framework is one I have been endorsing (as a practitioner) for a long time (my presentation noted above has been online many months before this article went to press.) I’ve had many spirited debates with teachers, arguing that the spirit of formative assessment and research-based interventions has a very positive research history and we are remiss not to use these methods first. However, for those kids who are not responding, I can often complete a solid individualized assessment that provides logical reasons as to why they are not responding, and continue to provide interventions that are related to dynamics and skills that are not readily manifest. There is absolutely no doubt in my mind that combining both approaches will allow us to look beyond “eligibility” to determining what a child needs.

Another of my thoughts is that much of the criticism of cognitive assessment not leading to intervention has been the lack of research for establishing ATIs (aptitude-treatment-interactions). However, establishing individualized interventions based on the needs of the child (that might not have a huge history of published research) does not mean we throw it out. Many RTI-only proponents argue that we might was well go right to special education and simply intensify the research-based interventions that could be done with a special education paradigm. I argue that doing flash cards to aid sight-reading might have an empirical support base, but doing flash cards all day long (one-on-one) with a blind student isn’t going to do a thing. However, designing an intervention around the varied needs and interests of the child could (and has) lead to positive results.

Finally, my other concern with the RTI-only paradigm is it seems “stuck” on reading…and only on three out of the big five components of the National Reading Panel (Phonemic Awareness, Phonics, and Fluency). There is little research on using CBM for math reasoning or written expression (beyond spelling and perhaps writing fluency.) I believe the most recent edition of School Psychology Review, 35(3), which focused on CBM for reading, writing, and math might have provided practice-based school psychologists with the research we need. Quite the contrary, most all of the articles dealt with math calculations and fluency, as well as on spelling, mechanics, and writing fluency. Clearly CBM/RTI research on higher-level reasoning processes, vocabulary, induction, deduction, inferential reasoning, and writing organization, were lacking from this issue. Until RTI-only advocates start providing research and guidance in these areas, we would be remiss to discard relevant assessment techniques that provide insights into these important skills and abilities.

Technorati Tags: psychology, educational psychology, school psychology, learning disabilities, KABC-II, CAS, IQ, IQ scores, intelligence, cognition, cognitive, RTI, response to intervention, CBM, curriculum based measurement, Psychology in the Schools, Garruto