IQ's Corner: SB-IV

Showing posts with label SB-IV. Show all posts

Thursday, May 03, 2012

Problems with the 1960 and 1984 Stanford-Binet IQ scores

A new IAP AP101 report (#13) dealing with the above issue was recently posted at the ICDP blog.

Posted using BlogPress from Kevin McGrew's iPad
www.themindhub.com

Thursday, December 02, 2010

IQ test battery publication timeline: Atkins MR/ID Flynn Effect cheat sheet

As I've become involved in consulting on Atkins MR/ID death penalty cases, a frequent topic raised is that of norm obsolescence (aka, the Flynn Effect). When talking with others I often have trouble spitting out the exact date of publication of the various revisions of tests, as I keep track of more than just the Wechsler batteries (which are the primary IQ tests in Atkins reports). I often wonder if others question my expertise...but most don't realize that there are more IQ batteries out there than just the Wechsler adult battery....and, in particular, a large number of child normed batteries and other batteries spanning childhood and adulthood. Thus, I decided to put together a cheat sheet for myself..one that I could print and have in my files. I put it together in the form of a simple IQ battery publication timeline. Below is an image of the figure. Double click on it to enlarge.

An important point to understand is that when serious discussions start focusing on the Flynn effect in trial's, most often the test publication date is NOT used in the calculation of how obsolete a set of test norms are. Instead, the best estimate of the year the test was normed/standardized is used, which is not included in this figure (you will need to locate this information). For example, the WAIS-R was published in 1981...but the manual states that the norming occurred from May 1976 to May 1980. Thus, in most Flynn effect discussions in court cases, the date of 1978 (middle of the norming period) is typically used. This makes recall of this information difficult for experts who track all the major individually administered IQ batteries.

Hope this helpful...if nothing else...you must admit that it is pretty :) Click on image to view

- iPost using BlogPress from my Kevin McGrew's iPad

intelligence intelligence testing Atkins cases ICDP blog psychology school psychology neuropsychology Forensic psychology criminal psychology criminal justice death penalty capital punishment ABA IQ tests IQ scores adaptive behavior AAIDD mental retardation intellectual disability Flynn effect

Sunday, February 14, 2010

IQ test selection could be life-or-death decision: WAIS v SB score differences in ID/MR sample

Interesting article "in press" in Intelligence that compares WAIS and Stanford Binet IQ scores (across different editions except the current SB5 and WAIS-IV) for adults with intellectual disability (ID/MR). Although the mixing together of scores across different editions makes it impossible to make SB/WAIS-specific edition comparisons, the finding that the WAIS scores were, on the average (mean), almost 17 points higher may surprise many psychologists. The authors discuss the real-life implications (i.e., Atkins ID death penalty decisions; eligibility for SS benefits, etc.) of different scores from different tests. As outlined in a prior IAP AP101 special report, differences of this magnitude between different IQ tests should not be surprising.

Silverman, W., Miezejeski, C., Ryan, R., Zigman, W., Krinsky-McHale & Urv, T. (in press). tanford-Binet and WAIS IQ differences and their implications for adults with intellectual disability (aka mental retardation). Intelligence.

Abstract

Stanford-Binet and Wechsler Adult Intelligence Scale (WAIS) IQs were compared for a group of 74 adults with intellectual disability (ID). In every case, WAIS Full Scale IQ was higher than the Stanford-Binet Composite IQ, with a mean difference of 16.7 points. These differences did not appear to be due to the lower minimum possible score for the Stanford-Binet. Additional comparisons with other measures suggested that the WAIS might systematically underestimate severity of intellectual impairment. Implications of these findings are discussed regarding determination of disability status, estimating prevalence of ID, assessing dementia and aging-related cognitive declines, and diagnosis of ID in forensic cases involving a possible death penalty.

A concluding comment from the authors

Nevertheless, psychologists cannot meet their ethical obligations in these cases without knowing which test provides the most valid estimate of true intelligence. The present data for individuals with relatively higher IQs, though sparse, indicate that differences between the Stanford-Binet and WAIS IQ tests can no longer be summarily dismissed as merely reflecting the scales' different floors. When test results are informing judgments of literal life and death, any suspected uncertainty regarding the validity of outcomes must be addressed aggressively.

Article Outline

1. Method
2. Results
3. Discussion
3.1. Disability determinations
3.2. Prevalence of ID
3.3. Declines with aging
3.4. Death penalty cases
3.5. Conclusion
Acknowledgements
References

Technorati Tags: psychology, educational psychology, school psychology, forensic psychology, court decisions, criminal psychology, criminal defense, Atkins cases, MR, ID, mental retardation, intellectual disability, death penalty, capital punishment, IQ, IQ scores, ABA, American Bar Association, ISIR, psychology and law, WAIS, Wechsler batteries, Stanford Binet, SB

Friday, October 23, 2009

Intelligence test "practice effects": New review overview article

New overview article of intellectual testing "practice effects" over at sister blog--ICDP

Technorati Tags: psychology, forensic psychology, criminal psychology, criminal justice, educational psychology, school psychology, neuropsychology, Atkins case, MR, mental retardation, death penalty, capital punishment, IQ tests, IQ scores, IQ, intelligence tests, measurement, intelligence, practice effects, Wechsler batteries, WAIS, WAIS-R, WAIS-III, Stanford-Binet, SB5, psychometrics

Tuesday, May 26, 2009

Dissertation Dish: SB5 and WISC-IV Gv predictors of math achievement

Visual-spatial processing and mathematics achievement: The predictive ability of the visual-spatial measures of the Stanford-Binet intelligence scales, Fifth Edition and the Wechsler Intelligence Scale for Children-Fourth Edition by Clifford, Eldon, Ph.D., University of South Dakota, 2008, 195 pages; AAT 3351188

Abstract

In the law and the literature there has been a disconnect between the definition of a learning disability and how it is operationalized. For the past 30 years, the primary method of learning disability identification has been a severe discrepancy between an individual's cognitive ability level and his/her academic achievement. The recent 2004 IDEA amendments have included language that allows for changes in identification procedures. This language suggests a specific learning disability may be identified by a student's failure to respond to a research based intervention (RTI). However, both identification methods fail to identify a learning disability based on the IDEA 2004 definition, which defines a specific learning disability primarily as a disorder in psychological processing. Research suggests that processing components play a critical role in academic tasks such as reading, writing and mathematics. Furthermore, there has been considerable research that suggests visual-spatial processing is related to mathematics achievement. The two most well known IQ tests, the Stanford-Binet-Fifth Edition (SB5) and the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV), were revised in 2003 to align more closely with the most current theory of intelligence, the Cattell-Horn-Carroll theory of cognitive abilities (CHC). Research supports both instruments have subtests that measure visual-spatial processing. The purpose of the current study is to identify which visual-spatial processing measure (SB5 or WISC-IV) is the better predictor of poor mathematics achievement. The participants were 112 6 th -8 th grade middle school students. Of the 112 original participants, 109 were included in the study. The comparison of the results of two separate sequential logistic regressions found that both measures could significantly predict mathematics achievement. However, given the relatively small amount of variance accounted for by both the SB5 and WISC-IV visual-spatial processing measures, the results had questionable practical significance.

Technorati Tags: psychology, educational psychology, school psychology, education, neuropsychology, mathematics, Gv, visual-spatial, SB5, WISC-IV, Stanford Binet, Wecsher, IQs Corner, Disseration Dish

Friday, May 15, 2009

CHC theory: Emergence, test instruments and school-related research brief

Contemporary Cattell-Horn-Carroll (CHC) intelligence test development, interpretation and applied research can be traced to a fortuitous meeting of Richard Woodcock, John Horn, and John “Jack” Carroll in the fall of 1985, a meeting also attended by the first author of this web-resource ( McGrew, 2005). This meeting resulted in the 1989 publication of the first individually-administered, nationally standardized CHC-based intelligence battery, the Woodcock- Johnson- Revised (Woodcock, McGrew, & Mather, 1989). This landmark event, which occurred 20 years ago, provided the impetus for the major CHC- driven evolution of school- based intelligence testing practice.

Subsequent important CHC events followed during this 20 year period, and included: (a) the first set of CHC- organized joint test battery factor analysis studies (Woodcock, 1990) which planted the seeds for the concept of CHC cross-battery (CB) assessment, (b) the first attempt to use the WJ-R, via a Kaufman-like supplemental testing strategy (Kaufman, 1979), to implement the yet to be named and operationalized CHC CB approach to testing ( McGrew, 1993), (c) the articulation of the first integrated Cattell-Horn-Carroll model and classification of the major intelligence batteries as per the CHC framework (McGrew, 1997), (d) the first description of the assumptions, foundations, and operational principles for CHC CB assessment and interpretation (Flanagan & McGrew, 1997; McGrew & Flanagan, 1998), (e) the publication of the first intelligence theory and assessment book to prominently feature CHC theory and assessment methods (Contemporary Intellectual Assessment: Theories, Tests, and Issues; Flanagan, Genshaft & Harrison, 1997; click here for link to 2nd edition), (f) the publication of the CHC CB assessment series ( Flanagan, McGrew & Ortiz, 2000; Flanagan, Ortiz, Alfonso & Mascolo, 2006; Flanagan, Ortiz & Mascolo, 2001, 2007; McGrew & Flanagan, 1998), (g) the completion of a series of CHC-organized studies that investigated the relations between CHC cognitive abilities and reading, math, and writing achievement (what you are reading now), (h) the articulation of CHC-grounded SLD assessment and eligibility frameworks (see Flanagan & Fiorello, manuscript in preparation) and (h) the subsequent CHC- grounded revisions or interpretations of a number of comprehensive individually administered intelligence test batteries ( Differential Abilities Scales—II, DAS-II;Stanford- Binet—5^thEdition, SB5;Kaufman Assessment Battery for Children—2^ndEdition, KABC- II). Although not overtly stated, the impact of CHC theory can be seen in the recent revisions of the venerable Wechsler trilogy ( WPPSI-III; WISC- IV; WAIS- IV) as well as the presentation of CHC CB procedures for interpreting the three Wechsler batteries ( Flanagan et al., 2000).

Click here for other posts in this series.

Monday, October 09, 2006

Are contemporary IQ tests being overfactored?

Are test developers (that includes me, the blog dictator) increasingly overfactoring intelligence test batteries?

According to an article by Frazier and Youngstrom "in press" in the prestigious journal Intelligence, contemporary test developers (and their publishing companies) "are not adequately measuring the number of factors they are purported to measure." Below is the reference citation and abstract (with a link to the article).

According the Frazier and Youngstrom, the purpose of their investigation was: "The present paper proposes that several forces have influenced this trend including: increasingly complex theories of intelligence (Carroll, 1993; Vernon, 1950), commercial test publishers' desire to provide assessment instruments with greater interpretive value to clinicians, publishers' desire to include minor ability factors that may only be of interest to researchers, and heavy reliance on liberal statistical criteria for determining the number of factors measured by a test. The latter hypothesis is evaluated empirically in the present study by comparing several statistical criteria for determining the number of factors present in current and historically relevant cognitive ability batteries."

As a coauthor of one of the batteries (WJ III) analyzed in this study and, in particular, the battery that measures the largest number of factors in their investigation, I feel compelled to respond to portions of this manuscript. Thus, readers should read the original article and then review my comments, fully recognizing that I have a commercial conflict of interest.

Before I present the major conclusions of the article and provide select responses, I'd like to first state that, in many respects, I think this is a well done article. Regardless of the extent to which I agree/disagree with Frazier and Youngstrom, the introduction is worth reading for at least two reasons.

The article provides a nice (brief) overview of development of psychometric intelligence theories from Spearman through early hierarchical theories (Vernon) to contemporary Carroll and Cattell-Horn Gf-Gc (the later two now often referred to as Cattell-Horn-Carroll [CHC] theory).
In addition, for individuals looking for a brief description and synopsis of the major statistical approaches to determining the number of factors to retain in factor analytic studies, pages 3-6 are recommended.

Frazier, T. & Youngstrom, E. (2006, in press). Historical increase in the number of factors measured by commercial tests of cognitive ability: Are we overfactoring? Intelligence.

Abstract

A historical increase in the number of factors purportedly measured by commercial tests of cognitive ability may result from four distinct pressures including: increasingly complex models of intelligence, test publishers' desires to provide clinically useful assessment instruments with greater interpretive value, test publishers' desires to include minor factors that may be of interest to researchers (but are not clinically useful), and liberal statistical criteria for determining the factor structure of tests. The present study examined the number of factors measured by several historically relevant and currently employed commercial tests of cognitive abilities using statistical criteria derived from principal components analyses, and exploratory and confirmatory factor analyses. Two infrequently used statistical criteria, that have been shown to accurately recover the number of factors in a data set, Horn's parallel analysis (HPA) and Minimum Average Partial (MAP) analysis, served as gold-standard criteria. As expected, there were significant increases over time in the number of factors purportedly measured by cognitive ability tests (r=.56, p=.030). Results also indicated significant recent increases in the overfactoring of cognitive ability tests. Developers of future cognitive assessment batteries may wish to increase the lengths of the batteries in order to more adequately measure additional factors. Alternatively, clinicians interested in briefer assessment strategies may benefit from short batteries that reliably assess general intellectual ability.

Additional comments/conclusions by the authors (followed by my comments/responses)

Frazier/Youngstrom comment: The extensive use of cognitive ability batteries in psychological assessment, an increased market for psychological assessments in general, a desire to create tests that are marketable to both clinicians and researchers, and the desire to increase the reliability of IQ measures may create a pressure on publishers to market ability tests that measure everything that other tests measure and more. This, in turn, forces other ability test publishers to try to keep pace.

McGrew comment/response: First, I will not attempt to comment on the "desires/pressures" of test developers/publishers of the other major intelligence batteries included in their analyses (Wechsler batteries, SB-IV, K-ABC, DAS). I restrict my comments to my experiences with the WJ-R and WJ III.

As a coauthor of the WJ III, and the primary data analyst for the WJ-R, I personally can vouch for the fact that there was no pressure exerted by the test publisher, nor we as co-authors, to measure more factors for the sake of just measuring more. As articulated clearly in the original WJ-R technical manual (McGrew, Werder & Woodcock, 1991), and subsequently summarized in the WJ III technical manual (McGrew & Woodcock, 2001), the driving force behind the number of factors was theory-driven, with the input of two of the most prominent psychometric intelligence theorists and factor analysts....John Horn and Jack Carroll (click here, here.) Both Horn and Carroll where intimately involved in the design and review of the factor results of the WJ-R and WJ III norm data. The driving "desire/pressure" during the revision of the WJ-R and WJ III was to validly measure, within practical constraints, the major features of the broad CHC/Gf-Gc abilities that are well established from decades of research (see Carroll's 1993 seminal work, click here, here). For additional information re: the involvement of Horn and Carroll in these deliberations, read the relevant sections of McGrew's (that be me) on-line version of CHC Theory: Past, Present, Future. If there was an underlying driving "pressure", it was to narrow the intelligence theory-practice gap.

Frazier/Youngstrom comment: Several important findings emerged from the present study. As predicted, commercial ability tests have become increasingly complex. While the length of these tests has risen only moderately, the number of factors purportedly measured by these tests has risen substantially, possibly even exponentially. It should be noted, however, that the possibility of an exponential increase in the number of factors purportedly measured may be due to inclusion of two outliers, the WJ-R and WJ-III. Possibly even more convincingly, the ratio of test length to factors purported has decreased dramatically. These trends suggest that test authors may be positing additional factors without including a sufficient number of subtests to measure these factors. When more accurate, recommended, statistical criteria were examined commercial ability tests were found to be substantially overfactored.

McGrew comment/response: My comment is primarily one of clarification for readers. Frazier and Youngstrom's statement that the ratio of test length to factors has decreased may be relevant to the other batteries analyzed, but is NOT true for the WJ-R and WJ III. The broad CHC factors measured by the WJ III are all represented by at least 3 or more test indicators, a commonly accepted criterion for proper identification of factors. Frazier and Youngstrom (and readers of their article) may find it informative to note that in Jack Carroll's final publication (The higher-stratum structure of cognitive abilities: Current evidence supports g and about ten broad factors. In Helmuth Nyborg (Ed.), The scientific study of general intelligence: Tribute to Arthur R. Jensen. Elsevier Science/Pergamon Press.-click here to access pre-pub copy of Carroll's chapter), Carroll stated that the WJ-R battery (which, when compared to the WJ III, has a lower test-factor ratio) was a "sufficient" set of data "for drawing conclusions about the higher-stratum structure of cognitive abilities." In describing the WJ-R dataset, he stated that "It is a dataset that was designed to test factorial structure only at a second or higher stratum, as suggested by Carroll (1993, p. 579), in that it has sufficient test variables to define several second-stratum factors, as well as the single third- stratum factor, but not necessarily any first-stratum factors." Jack Carroll is no slouch when it comes to the application of factor analysis methods. In fact, he is generally considered as one of the masters of the "art and science" of factor analysis and his contributions of the use of factor analysis methods to the study of cognitive abilities is well known (I recommend folks to read Chapter 3 in Carroll's seminal treatise on the factor structure of human cognitive abilities--"Chapter 3: Survey and Analysis of Correlational and Factor Analytic Research on Cognitive Abilities: Methodology). Frazier and Youngstrom place all of their eggs primarily in the "science" of factor analysis (emphasis on statistical tests). There is an "art" to the practice of factor analysis, something that is missing from the raw empirical approach to their investigation.

Frazier/Youngstrom comment: Results of the present study also suggest that overfactoring of ability tests may be worsening, as the discrepancy between the purported number of factors and the number indicated by MAP and HPA has risen over time and the ratio of subtests to factors purported has decreased substantially as well. While commercial pressures and increasingly complex models of human cognitive abilities are likely contributing to these recent increases, these explanations were not investigated in the present study.

McGrew comment/response: Where's the beef/data that supports the conclusion that "commercial pressures...are likely contributing to these recent increases?" In the absence of data, such a statement is inappropriate. Yes, increasingly complex models of human cognitive abilities are contributing to batteries that measure more abilities. Science is a process of improving our state of knowledge via the accumulation of evidence over time. The most solid empirical evidence supports a model of intelligence (CHC or Gf-Gc theory) that includes 7-9 broad stratum II abilities. Shouldn't assessment technology stay abreast of contemporary theory? I think the answer should be "yes." Since the authors state that "these explanations were not investigated in the present study" they should have refrained from their "commercial pressures" statement. I'm a bit surprised that such a statement, devoid of presented evidence, survived the editorial process of the journal.

Frazier/Youngstrom comment: Rather, evaluation centered on the hypothesis that test developers have been determining test structure using liberal, and often inaccurate, statistical criteria. This hypothesis was supported.."

McGrew comment/response: Aside from the failure to recognize the true art and science of the proper application of factor analysis, Frazier and Youngstrom commit a sin that is often committed by individuals (I'm not saying this is true of these two individuals) who become enamored by the magic of quantitative methods (myself included, during my early years...until the likes of John Horn, Jack McArdle, and Jack Horn personally tutored me on the limitations of any single quantitative method, like factor analysis). Briefly, factor analysis is an internal validity method. It can only evaluate the internal structural evidence of an intelligence battery. When I was a factor analytic neophyte, I was troubled by the inability to clearly differentiate (with either exploratory or confirmatory factor methods) reading and writing abilities (Grw) from verbal/crystallized (Gc) abilities. I thought the magic of factor analysis should show these as distinct factors. Both Horn and McArdle gently jolted my "factor analysis must be right" schema by reminding me that (and I'm paraphrasing from memory) "Kevin...factor analysis can only tell you so much about abilities. Often you must look outside of factor analysis, beyond the internal validity findings, to completely understand the totality of evidence that provides support for the differentiation of highly correlated abilities." In particular, Horn and McArdle urged me to examine growth curves for highly correlated abilities that could not be differentiated vis-à-vis factor analysis methods. When I examined the growth curves for Grw and Gc in the WJ-R data, I had an epiphany (note...click here for a report that includes curves for all WJ III tests....note, in particular, the differences between the reading/writing (Grw) and verbal (Gc) tests....tests that often "clump" together in factor analysis). They were correct. Although EFA and CFA could not clearly differentiate these factors, the developmental growth curves for Grw and Gc where dramatically different...so different that it would be hard to conclude that they are the same constructs. Long story short...Frazier and Youngstrom fail to recognize, which they could have in their discussion/limitations section, that construct validity is based on the totality of multiple sources of validity evidence. Internal structural validity evidence is only one form...albeit one of the easier ones to examine for intelligence batteries given the ease of factor analysis these days. As articulated in the Joint Test Standards, and nicely summarized by Horn and others, aside from structural/internal (factor analysis) evidence, evidence for constructs (and purported measures of the constructs) must also come from developmental, heritability, differential outcome prediction, and neurocognitive evidence. Only when all forms evidence are considered can one make a proper appraisal of the validity of the constructs measured by a theoretically-based intelligence battery. For those wanting additional information, click here (you will be taken to a discussion of the different forms of validity evidence as Dawn Flanagan and I discussed in our book, the Intelligence Test Desk Reference.)

I could go on and on with more points and counterpoints, but I shall stop here. I would urge readers to read this important article and integrate the points above when forming opinions regarding the accuracy/appropriateness of the author’s conclusions, particularly with regard to the WJ-R and WJ-III batteries. Also, consulting the WJ-R and WJ-III technical manuals, where multiple sources of validity evidence (internal and external) are presented to support the factor structure of the batteries are presented, is strongly recommended.

Technorati Tags: psychology, educational psychology, neuropsychology, school psychology, cognition, cognitive, IQ, IQ testing, intelligence, factor analysis, psychological tests, WJ-R, WJ III, Woodcock-Johnson, Wechsler, SB-IV, K-ABC

Friday, April 01, 2005

CHC bandwagon officially declared: Observations from 2005 NASP convention

OK. I unilaterally declare that the CHC assessment bandwagon is here and gathering steam in school psychology.

After two days at the National Association of School Psychologists (NASP) annual convention in Atlanta, I’m officially declaring that the CHC “tipping point” occurred sometime during the past five years and that the CHC bandwagon is getting larger.

During breakfast this morning, while skimming the convention program, I counted at least 20+ different workshops, papers, and/or posters that either dealt with CHC-designed batteries (e.g., WJ III, KABC-III, SB5), CHC theory, CHC Cross-Battery (CB) assessment, or mentioned CHC in the program abstract. This represents, in my informal memory-based analysis, a significant increase in presentations related to Gf-Gc/CHC theory and measurement over the past decade.

Having been involved in the 1977 WJ-to-WJ-R revision, a process that included having both Dr. John “Jack” Carroll and Dr. John Horn as the primary theory consultants (back then it was referred to as Gf-Gc theory), it is now exciting to see that the theory-to-practice gap is finally being bridged, and that an ever-increasing number of test authors and assessment practitioners are on the bandwagon riding over the CHC bridge. This is good for the field. This is good for kids (data being used to make decisions is now built on a solid foundation of validity evidence).

Welcome aboard Gale Roid (SB5 author) and Alan and Nadeen Kaufman (KABC-II authors). [Note....I predict that the DAS-II, will also have a strong CHC flavor.] It is good to see that respected scholars and test developers are now validating the “ahead of the curve” conclusion of Dr. Richard Woodcock, back in 1985, that the then Gf-Gc theory (now known as CHC theory) was “the” structural theory of intelligence that had the most solid empirical and theoretical foundation from which to develop measures of intelligence.

For those who want a historical perspective of what happened, when, and how (with regard to the movement of CHC research into school psychology assessment practice), please read my historical account as posted “up in the sky” (click here..also published in CAI2 book). The events that are documented provide the evidence for my claim of a CHC bandwagon effect.