Tuesday, January 05, 2010

The Wechsler-like IQ subtest scaled score metric: The potential for misuse, misinterpretation and impact on critical life decisions---draft report in search of feedback




The following are the first three paragraphs (and a critical figure) of a draft of an IAP Applied Psychometrics 101 Brief Report (#5).  The complete report can be download in PDF format by clicking here.  A web-page version of the complete report can be found by clicking here (note - the web page verision may NOT display two embedded figures....viewing the PDF copy may be necessary)

I'm providing this initial draft report with the expressed intent of soliciting feedback and comments regarding the accuracy and soundness of my analyses and logic.  I'm looking for critical feedback to improve the report.  This is a draft report that will be revised if comments suggest important changes.  Please read it in the spirit of "tossing out some critical ideas" for reflective analysis and feedback.  Feedback can be sent directly to me (iap@earthlink.net) or could be provided in the form of listserv thread discussions at the NASP and/or CHC listservs.


I've recently been skimming James Flynn's new book (What is Intelligence:  Beyond the Flynn Effect) to better understand the methodology and interpretation of the Flynn effect. Of particular interest to me (as an applied measurement person) is his analysis of the individual subtest scores from the various Wechsler scales across time. As most psychologists know, Wechsler subtest scaled scores (ss) are on a scale with a mean (M) = 10 and a standard deviation (SD) = 3. The subtest ss range from 1 to 19.  In Appendix 1 of his book, Flynn states "it is customary to score subtests on a scale in which the SD is 3, as opposed to IQ scores which are scaled with SD set at 15. To convert to IQ, just multiply subtest gains by five, as was done to get the IQ gains in the last column."  At first glance, this statement makes it sound as if the transformation of subtest ss to IQ SS is an easy (“just multiply….”; emphasis added by me) and mathematically acceptable procedure without problems. However, on close inspection this transformation has the potential to introduce unknown sources of error into the precision of the transformed SS scores.  It is the goal of this brief technical post to explain the issues involved when making this ss-to- IQ SS conversion.

The ss 1-19 scale has a long history in the Wechsler batteries. For sample, in Appendix 1 of Measurement of Adult Intelligence (Wechsler, 1944), Wechsler described the steps used to translate subtest raw scores to the new ss metric. The Wechsler batteries have continued this tradition in each new revision, although the methodology and procedures to calculate the ss 1-19 values have become more sophisticated over time.   Although the methods used to develop the Wechsler ss 1-19 scale may have become more sophisticated, the resultant underlying scale for each subtest has not…scores still range from 1-19 (M=10; SD=3).  Also, the most recent Stanford-Binet—5th Edition (SB5; Roid, 2003) and Kaufman Assessment Battery for Children-2nd Edition (KABC-II) have both adopted the same ss 1-19 scale for their respective individual subtests.

Why is this relatively crude (to be defined below) scale metric still used in some intelligence batteries when other contemporary intelligence batteries provide subtest scale metrics with finer measurement resolution?  For example, the DAS-II (Elliott, 2007) places individual test scores on the T-scale (M=50; SD=10), with scores that range from 10-90.  The WJ III (McGrew & Woodcock, 2001) places all test and composite scores on the standard score (SS) metric associated with full scale and composite scores (M=100; SD=15).  The critical question to be asked is “are there advantages or disadvantages to retaining the historical ss 1-19 scale or, are their real advantages to having individual test scales with finer measurement resolution (DAS-II; WJ III)?”

......continued............
(complete report available at links in first paragraph of this post)

[Double click on image to enlarge]





Technorati Tags: , ,, , , , , , , , , , , , , , , , , , , , , , , , , ,


4 comments:

Bryan Pesta said...

Still mulling over your post; I think you are on to something worthwhile.

I was more impressed with the earlier part, and less sure about the section on scoring errors (just my take on it; but perhaps a simpler conclusion is that scoring errors likely exist everywhere, but they are compounded with crude scales).

What I need to think about more is cases where scoring errors are biased in one direction or another-- I think you claim that!-- does this happen a lot? If not, I think the simpler conclusion above makes the point more strongly.

Also, I was expecting a few end comments on what this might do for interpreting / calculating the Flynn effect (if anything).

These are all just my opinions, and so are likely wrong!

Finally your post made me think of Lumsden's flogging wall (the height analogy). Just in case you're not familiar with it, it's a great read !

https://www.msu.edu/course/psy/818/snapshot.afs/deshon/Readings/Lumsden%20%281976%29%20-%20Test%20theory.pdf

Anonymous said...

What would a normal curve look like, with SS based on a mean of 100 and SD 15, if we attempted to draw the curve based on data supplied by the "just multiply by 5" concept? My guess is that the curve(?) would appear as a series of 19 disconnected points, above the SS values 5, 10, ..., 90, 95, with large gaps in between. How can we know what's happening in the gaps between the points? Or in the tails of the curve? Would it be fair (statistically) to connect the dots? What danger lurks beneath this type of "curve", if examiners draw conclusions based on the way that they traditionally use the (continuous) normal curve for IQ scores?

Kevin McGrew said...

Thanks for the feedback. You are correct that scoring errors can be in either direction and may in some cases cancel each other out....and it is just as possible for errors to be in the opposite direction (e.g., malingering by proxy). I was just trying to show one extreme scenario...in a revision I will discuss errors going both ways and the outcomes of positive or negative bias. Re: more comments re: Flynn Effect--that is going to take more time as I dig into some of the original publications. Thanks

Kevin McGrew said...

Your comments re: how the distribution may appear are exactly the point of my post. Like the "step function" graph I posted, the distribution would be a series of steps below the mean increasing to the middle of the distribution and then a set of steps decreasing on the over side. In effect it would look like a histogram plot...a series of increasing columns followed by a series of decreasing columns..