Showing posts with label WJ III NU. Show all posts
Showing posts with label WJ III NU. Show all posts

Saturday, June 07, 2014

More research on the validity of the C-LIM framework in intelligence testing

Another article adding to the Cultural-Linquistic Interpretive Matrix research literature. Click on images to enlarge. A copy of the article can be found here.

"Conclusion

The primary conclusion drawn from this study and previous research is that linguistic demand is an important consideration when selecting and interpreting tests of cognitive abilities. The implications of this study go beyond a re-classification of the C-LIM to emphasizing one of the underlying motivations of the C-LIM's initial inception—the importance of considering a student's linguistic background and abilities prior to selecting, administering, and interpreting tests of cognitive abilities. A comprehensive evaluation that takes a student's linguistic ability into consideration should consider that a student's language ability (i.e., conversational proficiency) might not be an accurate representation of a student's academic language abilities (Cummins, 2008). Thus, it would be beneficial to gather information on a student's academic language abil-ity, due to the relationship between education and IQ (Matarazzo & Herman, 1984). A student's receptive and expressive language abilities may be a worthwhile pursuit in future research, as student's level of conversational proficiency in the classroom may mislead educators and psy-chologists to assume that the student has been exposed to English with the same frequency and depth as his or her peers (Cummins, 2008). Moreover, as suggested by the results of this study, considering the influence of linguistic ability when assessing cognitive abilities should continue to be supported by empirical evidence, instead of school psychologists continuing to rely on informal measures of linguistic ability through language samples and student interviews to gain information on language ability (Ochoa, Galarza, & Gonzalez, 1996).

A second conclusion is that it is unclear how cultural loading can be represented quantitatively in a way that is meaningful both theoretically and practically. An important, albeit unanswered question is, "What variables do practitioners take into account when making decisions about the cultural influences that may affect the selection and interpretation of tests from cognitive batteries?" Flanagan and Ortiz (2001) define cultural loading as "the degree to which a given test requires specific knowledge of or experience with mainstream culture" (p. 243). However, this broad definition does not identify specific variables that practitioners may consider in practice to make these decisions about whether a student's experiences are significantly different from mainstream culture. Given these unanswered questions, it is possible that the underlying reasoning that led to the creation of the C-LIM and its categorization system needs to be re-thought (as also suggested by Styck & Watkins, 2013), particularly with respect to cultural loading. Specifically, it would be important to consider what is occurring and possible in practice, as this is the intended use of the C-LIM."

 

 

Wednesday, February 26, 2014

Woodcock-Johnson IV (WJ IV) NASP 2014 introduction and overview workshop slide shows

(Click on images to enlarge)

Last week I, together with Dr. Fred Schrank and Dr. Nancy Mather, unveiled the new Woodcock-Johnson IV Battery at the National Association of School Psychologists (NASP) annual 2014 convention in Washington, DC.  We presented a three hour introductory and overview workshop.  NASP members can download the handouts we provided at the NASP website.  It is my understanding that NASP will eventually provide access to a video of the workshop that will allow NASP members to view and earn CEU credits (I am not 100% sure of this; check with NASP--don't email me).

Since the information we presented is now public, we three coauthors wish to provide access to our presentation to others.  The three presentation title slides are below.  Each are followed by a link to my SlideShare account (click this link if you want to see all three listed, as well as all my other PPT modules) where the slide shows can be viewed.  You will note that not all the slides were presented at the workshop session are included, due to test security issues and the pre-publication nature of various technical information from the forthcoming technical manual.

Enjoy.  Also, as coauthors of the WJ IV, we all have a financial interest in the instrument.  A disclosure statement is present in Part 1 of the slides.  My individual conflict of interest disclosure statement can be found at the MindHub web portal.

Additional information can be found at the official WJ IV Riverside Publishing web page. 


 (Click here for Part 1)


 (Click here for Part 2)


 (Click here for Part 3)

Saturday, December 21, 2013

Gv Gallery Hall of Fame: Bootstrap resampling

Here is my simplified Gv explanation of the statistical procedure called bootstrap resampling. You can read the text about it in a special ASB for the WJ III NU. Click on image to enlarge.

 

 

Monday, May 20, 2013

Cluster analysis of the WJ III/WISC-III intelligence tests: OBG post


This is a OBG (oldie but goodie) post that has new updated links

In a prior shameless plug, I briefly summarized the results of a recently published CHC-based confirmatory factor analysis study of a WJ-III/WISC-III cross-battery data set (Phelps, McGrew, Knopik & Ford, 2005). Following a favorite quantoid mantra ("there is more than one way to explore a data set"), I couldn't resist but conduct a more loosey-goosey (sp?) exploratory analysis of the data.

One of my favorite exploratory tools, given the Gv presentation of the multivariate structure of the data, is hierarchical cluster analysis (sometimes referred to as the "poor man's" factor analysis). Without going into detail, I subjected the data set previously described to Ward's clustering algorithm. As a word of caution, it is important to note that cluster analysis will provide neat looking cluster dendograms for random data....so one must be careful not to over-interpret the results. Yet, I find the looser constraints of cluster analysis and, in particular, the continued collapsing of clusters of tests (and lower-order clusters) into ever increasing broad higher-order clusters very thought provoking---the results often suggest different broad (stratum II) or intermediate level strata (as per Carroll's 3-stratum model).

I present the current results "as is" (click here to view or download). Blogsters will need to consult prior posts to glean the necessary pieces of information to interpret the CHC factor codes and names, the abilities measured by the WJ III tests, etc.

To say the least, some interesting hypothesis are suggested. In particular, I continue to be intrigued by the possibility of a higher-order dual cognitive processing model structure (within the CHC taxonomy) --that is, a distinction between automatic vs controlled/deliberate processing

Tuesday, December 25, 2012

What we've learned from 20 years of CHC COG-ACH relations research: Back to the future and Beyond CHC

A draft of the paper I presented at the 1st Richard Woodcock Institute on Advances in Cognitive Assessment (this past spring at Tufts) can now be read by clicking here. Three of the 12 figures are included below......as a tease :). The final paper will be published by WMF Press.

 

Tuesday, July 24, 2012

New WIIIP 2.0 announced: Woodcock Interpretation and Instructional Interventions Software


WIIIP 2.0

Copyright 2012 Business Wire, Inc.

Business Wire

July 23, 2012 Monday 2:00 PM GMT

Houghton Mifflin Harcourt Launches Next-Generation Assessment System

Solution is ideal for targeting specific skill deficits; expands to ELL students

BOSTON

Global education leader Houghton Mifflin Harcourt (HMH) today announced that its Riverside division has released the next generation of the Woodcock Johnson suite of assessment products - the Woodcock Interpretation and InstructionalInterventions Program(TM) (WIIIP) 2.0. The new system provides users with the tools necessary to make sound clinical and instructional decisions across many solutions in the Woodcock Johnson suite of assessments. Whereas some Clinical assessments on the market merely provide the scoring mechanism, the WIIIP 2.0 goes two steps further, giving users interpretation guidance and instructional interventions for each student. WIIIP 2.0 has also expanded interventions to English Language Learners (ELL), empowering educators to overcome language barriers and correctly assess each student.

Built to broaden the benefits of earlier versions, WIIIP 2.0 provides an updated database of research-based interventions-more than 500 in total. The assessment will then assist the educator to analyze results to correctly identify disparities between achievement and predicted achievement based on cognitive ability levels. If a learning gap is identified, WIIIP 2.0 provides unique interventions to ensure that each student's needs are met.

WIIIP 2.0 builds on existing features with the addition of:

    * New interventions that are included for the Cattell-Horn-Carroll (CHC) cognitive factors, representing dozens of new interventions or accommodations.
    * New interventions that are specifically intended for English Language Learners (ELL).
    * Item-level mathematics procedures for two mathematics tests, which help to identify gaps in mathematical knowledge and receive formative interventions to address any underlying undeveloped mathematics skill.
    * Three reports that can be printed in Spanish, which include the summary, score report, and proficiency profile report.

"This new release of WIIIP is exciting not only because it offers new interventions, but because non-native speakers can benefit," said Jim Nicholson, President of Riverside, the testing and assessment division of HMH. "Version 2.0 reflects the needs of today's school psychologists and practitioners."

The Woodcock Johnson family of products includes more than a dozen assessments, which are all steeped in years of research and evidence-based inquiry.

"Not only does the WIIIP 2.0 meets today's needs, it provides assessment professionals with a bridge to the future - the assessment-intervention link," says WIIIP 2.0 author Dr. Fred Schrank. "The Woodcock-Johnson suite of products remains at the forefront of the assessment-intervention link, and the updated and enhanced version of the WIIIP was created to meet the needs of assessment professionals functioning in, or navigating toward, modern service delivery models."

WIIIP 2.0 is available as a kit or a downloadable upgrade for current 1.1 users. For more information on this product or the rest of the Woodcock Johnson suite of assessments, please visit http://www.riversidepublishing.com .

About Houghton Mifflin Harcourt

Houghton Mifflin Harcourt is a global learning company with the mission of changing lives by fostering passionate, curious learners. Among the world's largest providers of pre-K-12 education solutions and one of its longest-established publishing houses, HMH combines cutting-edge research, editorial excellence and technological innovation to improve teaching and learning environments and solve complex literacy and education challenges. HMH's interactive, results-driven education solutions are utilized by 60 million students in 120 countries, and its renowned and awarded novels, non-fiction, children's books and reference works are enjoyed by readers throughout the world. For more information, visit www.hmhco.com .

CONTACT: Houghton Mifflin Harcourt
Bianca Olson, 617-351-3841
Director, Corporate Communications
Bianca.olson@hmhpub.com   

A PDF copy of this announcement can be found here.

Conflict of interest disclosure:  I, Kevin McGrew, am a coauthor of the WJ III.

Thursday, July 05, 2012

AP101 Brief # 13: CHC-consistent scholastic aptitude clusters: Back to the Future


This is a continuation of a set of analyses previously posted under the title  Visual-graphic tools for implementing intelligent intelligence testing in SLD contexts:  Formative concepts and tools.  It is recommended that you read the prior post to obtain the necessary background and context, which will not be repeated here.

The third  method approach to SLD identification (POSW; pattern of strengths and weaknesseshas been advanced primarily by Flanagan and colleagues, as well as Hale and colleagues and Naglieri (see Flanagan & Fiorrello, 2010 for an overview and discussion).  A central concept to these POSW third method SLD models is that an individual with a possible SLD must show cognitive deficits that have been empirically or theoretically demonstrated to be the most relevant cognitive abilities for the achievement domain where the person is deficient.  That is, the individual's cognitive deficits are consistent or concordant with the persons academic deficits, in the context of other cognitive/achievement strengths that suggest strengths in non-SLD areas.  I have often referred to this as a domain-specific constellation or complex of abilities and achievements.

Inherent in these models is the operationalization of the notion of  aptitude-achievement consistency or concordance.  It is important to note that aptitude is not the same as general intelligence or IQ.  Aptitude in this contexts draws on the historical/traditional notion of aptitude that has been around for decades.  Richard Snow and colleagues have (IMHO) written the best information regarding this particular definition of aptitude.  Aptitude includes both cognitive and conative characteristics of a person (see Beyond IQ Project).  But for this specific post, I am only focusing on the cognitive portion of aptitude--which would, in simple terms, represent the best combination of particular CHC narrow or broad cognitive abilities that are most highly correlated with success within a particular narrow or broad achievement domain.

What are the CHC narrow or broad abilities most relevant to different achievement domains?  This information has been provided in a narrative research synthesis form by Flanagan and colleagues (in their various cross-battery books and chapters) and more recently in a structured empirical research synthesis by McGrew and Wendling (2010).  These CHC-based COG--ACH relations summaries provide assessment professionals with information on the specific broad or narrow CHC abilities most associated with sudomains in reading and math, and to a lessor extent writing.  Additionally, the McGrew and Wendling's (2010) synthesis provides information on developmental considerations--that is, the relative importance of CHC abilities for different achievement domains varies as a function of age.  McGrew and Wendling (2010) presented their results for three broad age groups (6-8; 9-13; 14-18 years of age).

Given this context, I presented a series of analysis (see the first post mentioned above as recommended background reading) that took the findings of the McGrew and Wendling (2010) as an initial starting point and used logical, empirical, and theoretical considerations to identify the best set of WJ III cognitive test predictors in the same three age groups for two illustrative achievement domains.  I have since winnowed down the best set of cognitive predictors in the two achievement domains (basic reading skills-BRS; math reasoning-MR).  I then took each set of carefully selected predictor tests and ran multiple regression models for each year of age from ages 6 thru 18 in the WJ III NU norm data.  I saved the standardized regression coefficients for each predictor and  plotted them by age. The plotted raw standardized coefficients demonstrated clear systematic developmental trends, but with noticeable "bounce" due to sampling error.  I thus generated smoothed curves using a non-linear smoothing function...with the smoothed curve representing the best estimate of the population parameters.  This technique has been used previously in a variety of studies that explored the relations between WJ-R/WJ III clusters and achievement (see McGrew, 1993 and McGrew and Wrightston, 1997 for examples and description of methodology).  Below is a plot of the raw standardized coefficients and the smoothed curve two of the significant predictors (Verbal Comprehension; Visual-Auditory Learning) for the prediction of the WJ III Basic Reading Skills cluster. [click on images to enlarge]. It is clear that the relative importance of Verbal Comprehension and Visual-Auditory Learning increase/decrease (respectively) systematically with age.

The next two figures present the final smoothed results for the CHC-based aptitude clusters for the prediction of the WJ III Basic Reading Skills and Math Reasoning clusters.

There is much that could be discussed after looking at the two figures.  Below are a few comments and thoughts.
  • The composition of what I am calling CHC-consistent scholastic aptitude clusters make theoretical and empirical (CHC-->ACH research synthesis) sense. For example, in both BRS and MR, Gc-LD/VL abilitiy (Verbal Comprehension) is salient at all ages and systematically increases in importance with age.  In BRS, visual-auditory associative memory (Glr-MA; Vis-Aud. Learning) is very important during the early school years (ages 6-9), but then disappears from being important in the prediction model.  This ability (test) is not found in the MR model.  Gf abilities (quantitative reasoning-RQ, Number Matrices; general sequential reasoning-RG, Analysis-Synthesis) are important throughout all ages for predicting math reasoning achievement.  In fact, both increase in relative importance with age, particularly for the measure of Gf-RQ (Number Matrices).  These two Gf tests are no where to be found in the BRS plot.  Instead, measures of  Ga abilities (Sound Blending; Sound Awareness) are important in the BRS model.  Gs and Gsm-WM (domain-general cognitive efficiency variables) are present in both the BRS and MR models.
  • The amount of explained variance (multiple R squared; Tables in figures) is higher for the CHC-consistent scholastic aptitude clusters when compared to the WJ III General Intellectual Ability (GIA-Std) clusters.  This is particularly true at the oldest ages for MR.  Of course, these values capitalize on chance factors due to the nature of multiple regression and would likely shrink somewhat in independent sample cross-validation (yes...I could have split the sample in half to develop and then cross-validate the models..but I didn't). 
  • These age-by-age plots provide a much more precise picture of the developmental nature of the relations between narrow CHC abilities and achievement than the McGrew & Wendling (2010) and Flanagan and colleagues reviews.  These findings suggest that when selecting tests for referral-focused selective assessment (see McGrew & Wendling, 2010) it is critical that examiners know the developmental nature of CHC--ACH relations research.  The fact that some specific narrow CHC tests show such dramatic changes across the ages suggests that those who implement a CHC-based aptitude-achievement consistency SLD model must be cautious and not use a "one size fits all" approach when determining which CHC abilities should be examined for the aptitude portion of the consistency model.  An ability that may be very important certain age levels may not be important at other age levels (e.g., Vis-Aud. Learning in the WJ III BRS aptitude cluster).  
  • The above results further reinforce the conclusion of McGrew & Wendling (2010) that development of more "intelligent" referral-focused selective assessment strategies requires a recognition that this process requires an understanding of the 3-way interaction of CHC abilities X Ach domains X Age (developmental status)
These results suggest that the field of intellectual assessment, particularly in the context of educational-related assessments, should go "Back to the Future."  The 1977 WJ and 1989 WJ-R batteries both included scholastic aptitude clusters (SAPTs; click here to read relevant select text from McGrew's two WJTCA books) as part of the WJ/WJ-R pragmatic decision-making discrepancy model.  In particular, see the Type I aptitude-achievement discrepancy feature in the second figure.  





The WJ and WJ-R SAPT's were differentially weighted combinations of the four best predictive tests across the norms sample.  See the two figures below which show the weighting schemes used.  Due to the lack of computerized norm tables and scoring that is now possible, a single set of average test weights were used for all ages.

[WJ SAPT weights]




 As I wrote in 1986, "because of their differential weighting system, the WJTCA Scholastic Aptitude clusters should provide some of the best curriculum-specific expectancy information available in the field of psychoeducational assessment" (p. 217).  Woodock (1984), in a defense of the SAPTs in School Psychology Review, made it clear that the composition of these clusters was to make the best possible aptitude-achievement comparison.  He stated that "the mix of cognitive skills included in each of the four scholastic aptitude clusters represents the best match with those achievement skills that could be obtained from the WJ cognitive subtests" (p.359).  However, the value of the WJ SAPTs were not fully appreciated at the time and was largely due to the IQ-ACH discrepancy model that constrained assessment professionals from using these measures as intended (McGrew, 1994).  This, unfortunately, led to their elimination in the WJ III and their replacement with the Predicted Achievement (PA) option which provided achievement domain-specific predictions of achievement based on the age-based optimal weighting of the seven individual tests that comprised the WJ III GIA-Std cluster.  Although effective and stronger predictors of achievement than the GIA-Std, the PA option never captured the attention of many assessment professionals...for a number of reasons (not covered here).

As I reiterated in 1994, when discussing the WJ-R SAPTs (same link as before), "The purpose of the WJTCA-R differential aptitude clusters is to provide predictions of current levels of achievement.  If a person obtains low scores on individual tests that measure cognitive abilities related to a specific achievement area and these tests are included in the aptitude cluster, then the person's current achievement expectancies should also be lowered.   This expectancy information will be more accurately communicated by the narrower WJTCA-R different aptitude clusters than by any broad-based score from the WJTCA-R or other tests" (p. 223).

The original WJ and WJ-R SAPTs were not presented as part of an explicitly defined comprehensive SLD identification model based on the concepts of consistency/concordance as was eventually advanced by Flanagan et al, Hale et al and Naglieri.  They were presented as part of a more general psychoeducational pragmatic decision making model.  However, it is clear that the WJ and WJ-R SAPTs were ahead of their time as they philosophically are in line with the aptitude portion of the aptitude-achievement consistency/concordance component of contemporary third method  SLD models.  In a sense, the field has now caught up with the WJ/WJ-R operationalization of aptitude clusters and they would now serve an important role in the aptitude-consistency SLD models.  It is my opinion that they represented the best available measurement approach to operationalizing domain-specific aptitudes for different achievement domains, which is at the heart of the new SLD models.

It is time to bring the SAPT's back...Back to the Future...as the logic of their design is a nice fit with the aptitude component of the aptitude-achievement consistency/concordance SLD models.  The field is now ready for this type of conceptualized and developed measure.


However, the original concept can now be improved upon via the methods and analyses presented in this (and the prior) post.  They can be improved upon via two methods:

1.  CHC-consistent aptitude clusters (aka, CHC designer aptitudes).  Creating  4-5 test clusters that are the bests predictors of achievement subdomains should utilize the extant CHC COG-->ACH relations literature when selecting the initial pool of tests to include in the prediction models.  This extant research literature should also guide the selection of variables in the final models...the models should not allowed to be driven by the raw empiricism of prediction.  This varies from the WJ and WJ-R SAPTS which were designed primarily based on empirical criteria (which combination predicted the most achievement variance), although their composition often made considerable theoretical sense when viewed via a post-hoc CHC lens.

2.  Provide age-based developmental weighting of the tests in the different CHC SAPTs.  The authors of the WJ III provided the necessary innovation to make this possible when they implemented an approach to constructing age-based differentially-weighted GIA g-scores via the WJ III computer scoring software.  The same technology can readily be applied to the development of CHC-designed SAPTS with developmentally shifting weights (as per the smoothed curves in the models above).  The technology is available.

Finally, I fully recognize that there are significant limitations in using an incremental partitioning of variance multiple regression approach to develop CHC-based SAPT's.  In other papers (g+specific abilities research using SEM causal models) I have been critical of this method.  The method was used here in an "intelligent" manner.....the selection of the initial pool of predictors was guided by the CHC COG-ACH extant literature and variables were not allowed to enter blindly into the final models.  The purpose of this (and the prior post) is to demonstrate the feasibility of designing CHC-consistent scholastic aptitude clusters.  I am pursuing other analyses with different methods to expand and improve upon this set of formative analyses and results.

Build it and they shall come.



Monday, May 28, 2012

Visual-graphic tools for implementing intelligent intelligence testing in SLD contexts: Formative concepts and tools

The slides in this post are preliminary findings and formative ideas that are going thru a period of incubation and will hopefully eventuate in an eventual presentation and manuscript. The material is being developed to help flesh out intelligent selective referral-focused assessment for RTI treatment resisters and to provide information to help implement the various third-method consistency/concordance models for SLD. I believe the figures speak largely for themselves if one studies them enough. Familiarity with the complete set of WJ III cognitive tests (Diagnostic Supplement included) is helpful as it will allow a person to decipher the test name abbreviations. Also, one needs to be familiar with the general concepts of third method consistency/concordance SLD models (see Flanagan & Fiorello, 2010). Also, one needs to be familiar with the CHC nomenclature codes to understand what each test is classified as measuring.

A visual representation of the third method SLD models as presented by Flanagan & Fiorello, 2010) is below. [Click on image to enlarge]



These results are derived from my art+science exploratory data analysis of the WJ III norm data guided by the findings from McGrew and Wendling (2010). Analyses included (a) multiple regression prediction of two reading and two math achievement subdomains (only two are featured here) with selected subsets of WJ III cognitive tests (using backward elimination of variables one-at-a-time; many eliminated tests re-entered at end to insure they were still not significant predictors that should have been retained in the final model), (b) calculation of all cognitive test g-loadings via first unrotated principal component across ages 6-18 years, and (c) multidimensional scaling (MDS--Guttman Radex model) of the complete set of WJ III cognitive predictors.

Visual graphic summaries are presented in hopes of stimulating thoughts about how these presentation methods might be used to allow examiners to engage in intelligent selective referral focused assessment decision-making grounded in data (but the numbers are minimized on purpose), theory, and logic.

Conflict of interest warning. I am a coauthor of the WJ III.

Click on images to enlarge.


















Conceptual link between third method approaches and above slides follows.




MDS illustrative model.









Conceptual link between the three preceding slides and third method SLD models




Posted using BlogPress from Kevin McGrew's iPad
www.themindhub.com

Thursday, March 01, 2012

IAP101 Brief #12: Use of IQ component part scores as indicators of general intelligence in SLD and MR/ID diagnosis

   
            Historically the concept of general intelligence (g), as operationalized by intelligence test battery global full scale IQ scores, has been central to the definition and classification of individuals with a specific learning disability (SLD) as well as individuals with an intellectual disability (ID).  More recently, contemporary definitions and operational criteria have elevated intelligence test battery composite or part scores to a more prominent role in diagnosis and classification of SLD and more recently in ID.
            In the case of SLD, third-method consistency definitions prominently feature component or part scores in (a) the identification of consistency between low achievement and relevant cognitive abilities or processing disorders and (b) the requirement that an individual demonstrate relative cognitive and achievement strengths (see Flanagan, Fiorello & Ortiz, 2010).  The global IQ score is de-emphasized in the third-method SLD methods.
            In contrast, the 11th edition of the AAIDD Intellectual Disability: Definition, Classification, and Systems of Supports manual (AAIDD, 2010) placed general intelligence, and thus global composite IQ scores, as central to the definition of intellectual functioning.  This has not been without challenge.  For example, the AAIDD ID definition has been criticized for an over-reliance on the construct of general intelligence and for ignoring contemporary psychometric theoretical and empirical research that has converged on a multidimensional hierarchical model of intelligence (viz., Cattell-Horn-Carroll or CHC theory).
The potential constraints of the “ID-as-a-general-intelligence-disability” definition was anticipated by the Committee on Disability Determination for Mental Retardation, in its National Research Council report “Mental Retardation:  Determining Eligibility for Social Security Benefits” (Reschly, Meyers & Hartel, 2001).  This national committee of experts concluded that “during the next decade, even greater alignment of intelligence tests and the IQ scores derived from them and the Horn-Cattell and Carroll models is likely.  As a result, the future will almost certainly see greater reliance on part scores, such as IQ scores for Gc and Gf, in addition to the traditional composite IQ.  That is, the traditional composite IQ may not be dropped, but greater emphasis will be placed on part scores than has been the case in the past” (Reschly et al., 2002, p. 94).  The committee stated that “whenever the validity of one or more part scores (subtests, scales) is questioned, examiners must also question whether the test’s total score is appropriate for guiding diagnostic decision making.  The total test score is usually considered the best estimate of a client’s overall intellectual functioning.  However, there are instances in which, and individuals for whom, the total test score may not be the best representation of overall cognitive functioning.” (p. 106-107).
            The increased emphasis on intelligence test battery composite part scores in SLD and ID diagnosis and classification raises a number of measurement and conceptual issues (Reschly et al., 2002).  For example, what are statistically significant differences?  What is a meaningful difference?  What appropriate cognitive abilities should serve as proxies of general intelligence when the global IQ is questioned?  What should be the magnitude of the total test score? 
Appropriate cognitive abilities will only be the only issue discussed here.  This issue addresses  which component or part scores are more correlated with general intelligence (g)—that is, what component part scores are high g-loaders?  The traditional consensus has been that measures of Gc (crystallized intelligence; comprehension-knowledge) and Gf (fluid intelligence or reasoning) are the highest g-loading measures and constructs and are the most likely candidates for elevated status when diagnosing ID (Reschly et al., 2002).  Although not always stated explicitly, the third method consistency SLD definitions specify that an individual must demonstrate “at least an average level of general cognitive ability or intelligence” (Flanagan et al., 2010, p.745), a statement that implicitly suggests cognitive abilities and component scores with high g-ness.
Table 1 is intended to provide guidance when using component part scores in the diagnosis and classification of SLD and ID (click on images to enlarge and use the browser zoom feature  to view; it is recommended you click here to access a PDF copy of the table..and also zoom in on it).  Table 1 presents a summary of the comprehensive, nationally normed, individually administered intelligence batteries that possess satisfactory psychometric characteristics (i.e., national norm samples, adequate reliability and validity for the composite g-score) for use in the diagnosis of ID and SLD.



The Composite g-score column lists the global general intelligence score provided by each intelligence battery.  This score is the best estimate of a persons general intellectual ability, which currently is most relevant to the diagnosis of ID as per AAIDD.  All composite g-scores listed in Table 1 meet Jensens (1998) psychometric sampling error criteria as valid estimates of general intelligence.  As per Jensens number of tests criterion, all intelligence batteries g-composites are based on a minimum of nine tests that sample at least three primary cognitive ability domains.  As per Jensens variety of tests criterion (i.e., information content, skills and demands for a variety of mental operations), the batteries, when viewed from the perspective of CHC theory, vary in ability domain coveragefour (CAS, SB5), five (KABC-II, WISC-IV, WAIS-IV), six (DAS-II) and seven (WJ III) (Flanagan, Ortiz & Alfonso, 2007; Keith & Reynolds, 2010).   As recommended by Jensen (1998), the particular collection of tests used to estimate g should come as close as possible, with some limited number of tests, to being a representative sample of all types of mental tests, and the various kinds of test should be represented as equally as possible (p. 85).  Users should consult sources such as Flanagan et al. (2007) and Keith and Reynolds, 2010) to determine how each intelligence battery approximates Jensens optimal design criterion, the specific CHC domains measured, and the proportional representation of the CHC domains in each batteries composite g-score.
Also included in Table 1 are the component part scales provided by each battery (e.g., WAIS-IV Verbal Comprehension Index, Perceptual Reasoning Index, Working Memory Index, and Processing Speed Index), followed by their respective within-battery g-loadings.[1]  Examination of the g-ness of composite scores from existing batteries (see last three columns in Table 1) suggests the traditional assumption that measures of Gf and Gc are the best proxies of general intelligence may not hold across all intelligence batteries.[2] 
In the case of the SB5, all five composite part scores are very similar in g-loadings (h2 = .72 to .79).  No single SB5 composite part score appears better than the other SB5 scores for suggesting average general intelligence (when the global IQ score is not used for this purpose).  At the other extreme is the WJ III where the Fluid Reasoning, Comprehension-Knowledge, Long-term Storage and Retrieval cluster scores are the best g-proxies for part-score based interpretation within the WJ III.  The WJ III Visual Processing and Processing Speed clusters are not composite part scores that should be emphasized as indicators of general intelligence.  Across all batteries that include a processing speed component part score (DAS-II, WAIS-IV, WISC-IV, WJ III) the respective processing speed scale is always the weakest proxy for general intelligence and thus, would not be viewed as a good estimate of general intelligence. 
            It is also clear that one cannot assume that composites with similar sounding names of measured abilities should have similar relative g-ness status within different batteries.  For example, the Gv (visual-spatial or visual processing) clusters in the DAS-II (Spatial Ability), SB5 (Visual-Spatial Processing) are relatively strong g-measures within their respective battery, but the same cannot be said for the WJ III Visual Processing cluster.  Even more interesting are the differences in the WAIS-IV and WISC-IV relative g-loadings for similarly sounding index scores. 
For example, the Working Memory Index is the highest g-loading component part score (tied with Perceptual Reasoning Index) in the WAIS-IV but is only third (out of four) in the WISC-IV.   The Working Memory Index is comprised of the Digit Span and Arithmetic subtests in the WAIS-IV and the Digit Span and the Letter-Number Sequencing subtests in the WISC-IV.  The Arithmetic subtest has been reported to be a factorially complex test which may tap fluid intelligence (Gf-RQ—quantitative reasoning), quantitative knowledge (Gq), working memory (Gsm), and possible processing speed (Gs; Keith & Reynolds, 2010; Phelps, McGrew, Knopik & Ford, 2005).   The factorially complex characteristics of the Arithmetic subtest (which, in essence, makes it function like a mini-g proxy) would explain why the WAIS-IV Working Memory Index is a good proxy for g in the WAIS-IV but not in the WISC-IV. The WAIS-IV and WISC-IV Working Memory Index scales, although named the same, are not measuring identical constructs.

A critical caveat is that the g-loadings cannot be compared across different batteries.  g-loadings may change when the mixture of measures included in the analyses change.  Different "flavors" of g can result (Carroll, 1993; Jensen, 1998). The only way to compare the g-ness across batteries is with appropriately designed cross- or joint-battery analysis (e.g., WAIS-IV, SB5 and WJ III analyzed in a common sample).
The above within and across intelligence battery examples illustrates that those who use component part scores as an estimate of a person’s general intelligence must be aware of the composition and psychometric g-ness of the component scores within each intelligence battery.  Not all component part scores in different intelligence batteries are created equal (with regard to g-ness).  Also, not all similarly named factor-based composite scores may measure the same identical construct and may vary in degree of within battery g-ness.  This is not a new problem in the context of naming factors in factor analysis, and by extension, factor-based intelligence test composite scores, Cliff (1983) described this nominalistic fallacy in simple language—“if we name something, this does not mean we understand it” (p. 120). 




[1] As noted in the footnotes in Table 1, all composite score g-loadings were computed by Kevin McGrew by entering the smallest number (and largest age ranges covered) of the published correlation matrices within each intelligence batteries technical manual (note the exception for the WJ III) in order to obtain an average g-loading estimate.  It would have been possible to calculate and report these values for each age-differentiated correlation matrix for each intelligence battery.  However, the purpose of this table is to provide the best possible average value across the entire age-range of each intelligence battery.  Floyd and colleagues have published age-differentiated g-loadings for the DAS-II and WJ III.  Those values were not used as they are based on the use of the principal common factor analysis method, a method that  analyzes the reliable shared variance among tests.  Although principal factor and principal component loadings typically will order measures in the same relative position, the principal factor loadings typically will be lower.  Given that the imperfect manifest composite scale scores are those that are utilized in practice, and to also allow uniformity in the calculation of the g-loadings reported in Table 1, principal component analysis was used in this work. The same rationale was used for not using the latent factor loadings on a higher-order g-factor in SEM/CFA analysis of each test battery.  Loadings from CFA analyses represent the relations between the underlying theoretical ability constructs and g purged of measurement error.  Also, frequently the final CFA solutions reported in a batteries technical manual (or independent journal articles) allow tests to be factorially complex (load on more than one latent factor), a measurement model that does not resemble the real world reality of the manifest/observed composite scores used in practice.  Latent factor loadings on a higher-order g-factor will often differ significantly from principal component loadings based on the manifest measures, both in absolute magnitude and relative size (e.g., see high Ga loading on g in WJ III technical manual which is at variance with the manifest variable based Ga loading reported in Table 1) 
[2] The h2 values are the values that should be used to compare the relative amount of g-variance present in the component part scores within each intelligence battery.

Friday, May 13, 2011

CHC narrow ability assessment with the WJ III battery: IAP Applied Psychometrics 101 #12

I am pleased to release the following working paper:  IAP Applied Psychometrics 101 #12:  CHC Narrow Ability Assessment with the WJ III Battery.

Below is the abstract:

  • Recently, a special issue of Psychology in the Schools (PITS) “took stock” of the past 20 years of CHC research (Newton & McGrew, 2010). In this special issue McGrew and Wendling (2010) reviewed the extant CHC cognitive-achievement relations research and concluded that “[T]he primary action is at the narrow ability level” (p, 669).   McGrew and Wendling concluded if the goal is to better understand, assess, and develop interventions for subareas of reading (e.g., phonics, comprehension) and math (e.g., calculation, problem-solving), narrow is better.  Broad (stratum II) CHC abilities (e.g., Fluid Reasoning-Gf; Auditory Processing-Ga) best predict and explain broad academic domains (e.g., total or broad reading).  However, narrow (stratum I) abilities best predict and explain narrow academic domains (e.g., reading comprehension).
  • The purpose of this working paper is to present a list of (a) WJ III test-author provided norm-based narrow CHC ability clusters and (b) additional clinical narrow clusters (not provided by the test authors in the published WJ III).  A secondary purpose is to list possible supplemental tests or composites from other major intelligence or achievement batteries that might be used to supplement the listed WJ III narrow ability clusters.  

This document resulted from two recent presentations where I summarized contemporary research that investigated the relations between broad and narrow CHC abilities and reading and math achievement.  Audience participants, especially at the Georgia School Psychology Association conference, suggested I needed to develop a summary table of the guts of my WJ III related material.  This report is the promised "deliverable" to those folks.  Thanks school psychologists in Georgia.  The report has some bonus features (e.g., Schenider & McGrew, in press, CHC v2.0 model and definitions--to be published this fall in Flanagan & Harrison's 3rd Edition of Contemporary Intellectual Assessment).  This bonus feature is an abridged set of definitions and the reader is encouraged to read the complete chapter when published for much more detail.

Feedback is appreciated as this is a work in progress.  I would like any feedback/comments to occur on the CHC listserv (n=1282 and growing), as the allows for a more dynamic exchange of ideas than does the comment feature of the blog platform.

Thanks.  Enjoy.


- iPost using BlogPress from my Kevin McGrew's iPad


Generated by: Tag Generator
Technorati Tags: , , ,

Sunday, May 01, 2011

IQ's Reading: Support for speed of reasoning ability (Carroll's RE; Horn's CDS)

Article "in press" in Intelligence by Goldhammer et al. that provides support for a speed of reasoning factor. I have provided additional comments in the article via the IQ's Readings blog feature.

No major individual intelligence battery appears to measure this construct. We the authors of the WJ III (conflict of interest disclosure--I am coauthor of the WJ III) intended our Decision Speed test to represent some of this ability. To date we have not been able to demonstrate validity evidence for this interpretation. This may be due to two factors. First, all post-WJ III analyses I have completed have found the DS test to covary with the Retrieval Fluency and Rapid Picture Naming tests. RF and RPN covary very strong....and I have interpreted this as reflecting the narrow ability of NA (naming facility) or what is often called RAN, but which I prefer to call "speed of lexical access" as per the reading research of Perfetti. The DS test tends to "hang out" with these two other tests and appears to tap this speed of lexical access ability to some degree, most likely due to the need for examinee's to quickly access the meaning of the common objects before deciding which two are the same conceptually.

The other possibility that the DS test may measure some RE variance but this has not been possible to validate due to the lack of other valid RE indicators in the WJ III collection of tests analyzed.

Anyone looking for a good thesis/dissertation? I could envision a study which tests administered that allow for the specification of perceptual speed (P), speed of lexical access (NA), and speed of reasoning (RE) factors that also includes the WJ III RF, RPN and DS tests.

Double click on images to enlarge







- iPost using BlogPress from my Kevin McGrew's iPad

Generated by: Tag Generator


Sunday, January 02, 2011

MDS analysis of WISC-IV

It is no secret that I'm a big fan of multidimensional scaling (MDS--especially Guttman's Radex) model as a supplement to factor analysis of cognitive tests. While going thru some of my e-files I found a recent 3D MDS analysis of the WISC-IV. Below is the abstract and final 3D model. Clicking on images should take you to a larger version of the image.








For those interested, the content/stimulus dimension of my proposed cognitive ability assessment design and interpretation matrix is due to my application of MDS to data from the WJ III and the various Wechsler batteries. The complete "beyond CHC theory" presentation can be found at a prior post.



- iPost using BlogPress from my Kevin McGrew's iPad

Thursday, December 30, 2010

Visual (Gv) summary of CHC intelligence structure of WJ III cognitive battery

I was skimming the article below and found a nice figure that summarizes the CHC abilities measured by the WJ III Tests of Cognitive Abilities. I love good visual (Gv) summaries. If things work correctly, if you click on the images they should enlarge.

Conflict of interest notice - I am a coauthor of the WJ III.









- iPost using BlogPress from my Kevin McGrew's iPad


Thursday, December 02, 2010

IQ test battery publication timeline: Atkins MR/ID Flynn Effect cheat sheet

As I've become involved in consulting on Atkins MR/ID death penalty cases, a frequent topic raised is that of norm obsolescence (aka, the Flynn Effect). When talking with others I often have trouble spitting out the exact date of publication of the various revisions of tests, as I keep track of more than just the Wechsler batteries (which are the primary IQ tests in Atkins reports). I often wonder if others question my expertise...but most don't realize that there are more IQ batteries out there than just the Wechsler adult battery....and, in particular, a large number of child normed batteries and other batteries spanning childhood and adulthood. Thus, I decided to put together a cheat sheet for myself..one that I could print and have in my files. I put it together in the form of a simple IQ battery publication timeline. Below is an image of the figure. Double click on it to enlarge.

An important point to understand is that when serious discussions start focusing on the Flynn effect in trial's, most often the test publication date is NOT used in the calculation of how obsolete a set of test norms are. Instead, the best estimate of the year the test was normed/standardized is used, which is not included in this figure (you will need to locate this information). For example, the WAIS-R was published in 1981...but the manual states that the norming occurred from May 1976 to May 1980. Thus, in most Flynn effect discussions in court cases, the date of 1978 (middle of the norming period) is typically used. This makes recall of this information difficult for experts who track all the major individually administered IQ batteries.

Hope this helpful...if nothing else...you must admit that it is pretty :)  Click on image to view





- iPost using BlogPress from my Kevin McGrew's iPad