IQ's Corner: statistics

I was pleased to learn that one of the most respected quantoids I know has revised his classic book, Multiple Regression and Beyond: An Introduction to Multiple Regression and Structural Equation Modeling. Dr. Keith's quantitative skills are top notch. He is one of three quantoids I consult when I need advice. Tim has an uncanny talent for making statistical concepts understandable.

Two big thumbs up for Tim's new edition. Additional information (including link to Amazon) can be found at his web page.

Thursday, November 25, 2010

Research Byte: Reporting effect sizes in psych research

Excellent article. Double click on image to enlarge (I hope)

- iPost using BlogPress from my Kevin McGrew's iPad

Thursday, July 15, 2010

Quantoids corner: Intro to hierarchical linear modeling (HLM)

I LOVE it when more applied journals publish articles where complex statistical methods are presented to a less statistically oriented audience, as I often find these "quanatoid explanations for dummies" an excellent introduction to complex statistical methods. Today I discovered that Gifted Child Quarterly has published a brief two-part series of articles that provide a nice introduction to HLM. I've never run HLM models, so I found the introduction very helpful. So much so that I might run some HLM on some appropriate datasets I have just to see it work.

Below are the two articles. Enjoy. Kudos to GCQ and Dr. McCoach.

McCoach, D. B. & Adelson, J. L. Dealing with dependence (Part 1): Understanding the effects of clustered data. Gifted Child Quarterly, 54(2), 152-155.

This article provides a conceptual introduction to the issues surrounding the analysis of clustered (nested) data. We define the intraclass correlation coefficient (ICC) and the design effect, and we explain their effect on the standard error. When the ICC is greater than 0, then the design effect is greater than 1. In such a scenario, the standard error produced under the assumption of independence is underestimated. This increases the Type I error rate. We provide a short illustration of the effect of non-independence on the standard error. We show that after accounting for the design effect, our decision about the statistical significance of the test statistic changes. When we fail to account for the clustered nature of the data, we conclude that the difference between the two groups is statistically significant. However, once we adjust the standard error for the design effect, the difference is no longer statistically significant.

McCoach, D. B. (2010). Dealing With Dependence (Part II): A Gentle Introduction to Hierarchical Linear
Modeling. Gifted Child Quarterly, 54(3), 252-256.

In education, most naturally occurring data are clustered within contexts. Students are clustered within classrooms, classrooms are clustered within schools, and schools are clustered within districts. When people are clustered within naturally occurring organizational units such as schools, classrooms, or districts, the responses of people from the same cluster are likely to exhibit some degree of relatedness with each other. The use of hierarchical linear modeling allows researchers to adjust for and model this non-independence. Furthermore, it may be of great substantive interest to try to understand the degree to which people from the same cluster are similar to each other and then to try to identify variables that help us to understand differences both within and across clusters. In HLM, we endeavor to understand and explain between- and within-cluster variability of an outcome variable of interest. We can also use predictors at both the individual level (level 1), and the contextual level (level 2) to explain the variance in the dependent variable. This article presents a simple example using a real data set and walk through the interpretation of a simple hierarchical linear model to illustrate the utility of the technique.

Technorati Tags: Psychology, school psychology, developmental psychology, educational psychology, forensic psychology, neuropsychology, special education, intelligence, cognitive abilities, cognition, intelligence theories, statistics, HLM, hierarchical linear modeling, gifted, Gifted Child Quarterly, quatoids corner

Monday, May 04, 2009

Quantoids corner: Statistical webinar resource

I just stumbled across a site (The Analysis Factor) that provides webinars on various statistical analysis issues. I've not used the service but thought other quantoids might want to take a look.

Technorati Tags: psychology, school psychology, educational psychology, neuropsychology, statistics, webinars

Friday, April 10, 2009

The attack of the psychometricians: Psychological measurement

I'm just in the processing of reading Borsboom's (2006; Psychometrika) provocative article "The attack of the psychometricians." The article abstract is below. As I'm reading, I'm loving a number of statements meant to get the attention of psychologists. Here is the most recent favorite.

"psychologists have a tendency to endow obsolete techniques with obscure interpretations"

Abstract: This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.

Technorati Tags: psychology, school psychology, educational psychology, psychometrics, measurement, Psychometrika, neuropsychology

Saturday, March 14, 2009

Quantoids corner: Confirmatory factor analysis guidelines

Just read a good article in Psychological Methods on the state-of-the-art of CFA methods, statistical methods used with considerable frequency in intelligence research. Here is a nifty manuscript/research checklist ... double click on image to enlarge. I will follow-up with a more detailed post in the next few days.

Tuesday, March 10, 2009

Quantoids corner: Reporting CFA results

Double click on image to enlarge

Monday, March 02, 2009

Quantoids corner: Dealing with (and planning for) missing data in data gathering

It has been a long time since I've made a post that may tweak the cockles of the quantoids who read this blog. This is one for my fellow quants....and is also intended for those less quantitatively oriented---as the topic is one that will mentioned with greater regularity in research articles, test manuals, etc.

Missing data has been a problem that has plagued researchers and test developers for decades. Over the past 20 years very sophisticated methods of handling missing data and producing "complete" data sets via sophisticated statistical algorithms have become available. And....many individuals who have run data may have used these procedures and have been completely unaware that their analysis used imputed or plausible values! For example, if you use one of the primary structural equation modeling (SEM) software programs (e.g., LISREL; Mplus; AMOS), and you had incomplete data on some subjects, the programs most likely utilized one of these new algorithms to impute plausable values before running the SEM model.

I've been schooling myself on this literature for the past 15 years and have found these contemporary missing data imputation methods very useful. More and more researchers need to become aware of the benefits of these methods, as well as some of the nuances of using it correctly.

This past week I received a copy of the latest issue of the Annual Review of Psychology and found (to my pleasure) probably the most simple, conceputal, understandable summary of this area of statistics. I was not surprsied to see that it was written by John Graham, who has written many other important journal articles on this topic. I would urge the readers of IQs Corner who conduct applied research or test develpoment projects to read this overview article. It is well worth the read. Also, I would suggest that readers take a serious look at the NORM software of Schaefer...the program I use when serious data imputation is necessary. A nicely written description of the program, as well as a short and sweet overview of some of the missing data literature, is available in an article written by Darmawan (2004).

What is really cool is the concept of "planned missing data"-----that is, designing one's data collection project to deliberately have missing data in order to allow for the collection of more variables across a larger number of subjects....which can then be handled (if designed correctly) via these new quantoid toys.

Fellow (and future) quantoids...enjoy

Technorati Tags: psychology, educational psychology, statistics, school psychology, neuropsychology, missing data, imputation, data imputation, NORM software, quantoid, quantoids corner

Friday, January 23, 2009

Reading and dyslexia: Should RAN run and hide?

I just read an excellent article that investigated the relative importance of phonological awareness, naming speed (RAN-rapid automatized naming), orthographic knowledge, and morphological awareness in understanding reading achievement. Although caution is in order given the total sample size (n=93), this study is an excellent example of the type of research we need more of in educational psychology.

First.....one great feature of the article is the description and definition of the four different reading-related constructs that have recently become recognized as important in learning to read. I would recommend reading the introduction just to better one's understanding of phonological awareness, etc.

However, my real excitement for this article is that it directly attempts to deal (at least partially) with the problem of specification error, a type of research design error that occurs when potentially important variables in predictive or explanatory studies are omitted. This type of error can lead to biased estimates of the effects (relative importance) of predictive variables. I've soap-boxed about this before and will not repeat my lengthy diatribe here. Long story short - I believe that much of the "hot" reading/dyslexia research that has recently dominated the educational, special education, and school psychology fields may have too quickly anointed some emperors (phonemic awareness; RAN) and gave them too much credit. Read my prior post at the link above. Unless you've been living under a rock (and you work with kids with reading problems), it seems like there is constant chatter about "RAN this...RAN that...RAN is it....etc." Yes....I am exaggerating to make my point.

Why do I like this current article (or why does it soothe my ranting a tad)? Simple. It did not just study RAN and/or phonemic awareness in isolation as predictors of reading....it allowed them to compete for the explanation of reading together with orthographic knowledge and morphological awareness. And guess what? RAN failed to place in the race! When entered in a simultaneous regression model to predict reading, RAN added nothing to the prediction of reading when phonemic awareness, orthographic knowledge, and morphological awareness where also in the running.

This article suggests that the hype around RAN may have been over-exaggerated, due to specification error in a ton of the hot and sexy reading research that has dominated our professional journals and conferences this past decade.

But don't get me wrong, there is a good body of evidence that suggests that the processes underlying RAN are probably important for early reading. My point, which is buttressed by this article, is that maybe it has been given too much credit....and needs to be knocked down a notch.

I would be remiss if I did not also criticize this current study for also failing to include other potentially important predictors of reading. For example, I would have liked to see the authors also include measures of working memory (Gsm-MW), lexical knowledge (Gc-VL) or vocabulary, perceptual speed (Gs-P), and associative memory (Glr-MA)....based on my reading of the extant reading literature.

I will now get down from my specification error soap box. The take away message is that we need more studies that take off the blinders and include a more comprehensive array of research-based indicators of important constructs related to reading (and all areas of school learning)...so we can ascertain which constructs/abilities are important, and to what degree. Also...I would prefer if these researchers had specified a research- or theoretically-based causal SEM model (with possible direct and indirect causal paths between the constructs)---maybe RAN would be seen as being more important...possibly as a direct or indirect cause (or outcome) of the other predictors.

Below is the article reference, abstract and link for your reading.

Roman, A. A., Kirby, J. R., Parrila, R. K., WadeWoolley, L., & Deacon, S. H. (2009). Toward a comprehensive view of the skills involved in word reading in Grades 4, 6, and 8. Journal of Experimental Child Psychology, 102(1), 96-113. (click here to view/read)

Abstract: Research to date has proposed four main variables involved in reading development: phonological awareness, naming speed, orthographic knowledge, and morphological awareness. Although each of these variables has been examined in the context of one or two of the other variables, this study examines all four factors together to assess their unique contribution to reading. A sample of children in Grades 4, 6, and 8 (ages 10, 12, and 14 years) completed a battery of tests that included at least one measure of each of the four variables and two measures of reading accuracy. Phonological awareness, orthographic knowledge, and morphological awareness each contributed uniquely to real word and pseudoword reading beyond the other variables, whereas naming speed did not survive these stringent controls. The results support the sustained importance of these three skills in reading by older readers.

Technorati Tags: psychology, education, educational psychology, school psychology, developmental psychology, special education, LD, dyslexia, reading, LD, neuropsychology, RAN, morphological awareness, orthographic knowledge, phonemic awareness

Wednesday, January 21, 2009

Let's hear it for science, data and statistics

I heard a few exciting phrases in Obama's speech yesterday. One was restoring science to it's proper place as noted in the article at link below. The other was substantiating things that are "subject to data and statistics"

Soothing words to this quantoid.

http://www.sciam.com/podcast/episode.cfm?id=we-will-restore-science-to-its-righ-09-01-21

Sent from KMcGrew iPhone (IQMobile). (If message includes an image-double click on it to make larger-if hard to see)

Sunday, December 14, 2008

Magma guide statistics book

Interesting post at one of my new favorite blogs (GOOD MATH BAD MATH)
re: new style of presenting statstics to students.

http://scienceblogs.com/goodmath/2008/12/book_review_the_manga_guide_to.php

Sent from KMcGrew iPhone (IQMobile). (If message includes an image-
double click on it to make larger-if hard to see)

Tuesday, June 10, 2008

ITEMS: Ed measurement/statistics web-based instructional modules

I just read about the ITEM project in the latest issue of Educational Measurement: Issues and Practice. All school-based assessment professionals might want to take a look and see....the materials may be useful in educating others about what scores mean, what they do and don't tell us, and what score differences mean.

Project Description from ITEMS web page (emphasis and links added by IQ's Corner blogmaster)

In the current No Child Left Behind era, K-12 teachers and administrators are expected to have a sophisticated understanding of standardized test results, use them to improve instruction, and communicate them to others. Many educators, however, have never had the opportunity to acquire the "assessment literacy" required for these roles. The goal of the ITEMS project, directed by Rebecca Zwick of the University of California Santa Barbara, was to develop and evaluate three Web-based instructional modules in educational measurement and statistics to address this training gap. We created three 25-minute modules: "What's the Score?" (2005), "What Test Scores Do and Don't Tell Us" (2006), and "What's the Difference?" (2007). Overall, 250 K-12 teachers and administrators participated in our research, which demonstrated the effectiveness of the modules in communicating educational measurement and statistics concepts, especially for teacher education students. Our modules are now freely available on our website, http://items.education.ucsb.edu, in low- and high-bandwidth versions, with optional closed captioning. Also posted are supplementary materials, including glossaries, formulas, reference lists, and quizzes corresponding to each module. The provision of this training in a convenient and economical way is intended to assist schools with the successful implementation and interpretation of assessments. Several school districts have let us know they are using the materials, and at least one teacher education program has incorporated them into its curriculum.

Technorati Tags: psychology, educational psychology, school psychology, education, educational testing, educational measurement, statistics, testing, NCLB, test scores, psychometrics

Thursday, June 05, 2008

WJ III discrepancy base rate information

Recently a question was asked on the NASP listserv on how to find "base rate" information regarding WJ III discrepancy scores. Barb Wendling, a colleague and friend, posted a very thorough response+. I decided to make her well-written explanation below...so it can be readily available for future reference. Kudos to Barb [Conflict of interest statement- I'm a coauthor of the WJ III].

Discrepancy Norms

The conorming of the WJ III ACH and the WJ III COG made it possible to compute discrepancy scores for each individual in the norming sample and then to prepare discrepancy norms using that information. For both the variation and discrepancy procedures, "ACTUAL" discrepancy norms in contrast to "PSEUDO" discrepancy norms are used to determine the significance and frequency of any variations or discrepancies (see report at following link for additional explanation:). These discrepancy norms offer several advantages to practitioners. First, both the cognitive and achievement batteries were normed on the same large, nationally representative sample, eliminating errors from unknown differences that exist when using tests based on different norm samples. Second, the correlation coefficients are known between ability and achievement at all ages so it is not necessary to estimate and make a correction for the amount of regression based on a few correlations and a limited sample. Third, practitioners can evaluate the significance of a discrepancy by using either the percentile rank of the discrepancy (Discrepancy PR) or the difference between the achievement score and the predicted achievement score in standard error of estimate units (Discrepancy SD).

Discrepancy percentile rank. This score defines the percent of the population that has a particular size discrepancy between the actual and predicted scores. The Discrepancy PR is a norm-based estimate of the BASE RATE of a particular discrepancy in the population. Unlike base rates offered by other tests which are typically based on cross-sectional data, the WJ III provides more precision by reporting the information by subgroups (same age or grade) within the norm sample.

Discrepancy standard deviation. This score reflects the distance that the examinee’s score is above or below the average score for age or grade mates. The Discrepancy SD allows the criterion of significance to be defined in terms of the standard error of the estimate. Commonly, a Discrepancy SD of +/-1.5 or greater is selected as being significant. A discrepancy of this magnitude would occur approximately 6 out of 100 times. Practitioners may, however, select a different level of significance ranging from +/- 1.0 to +/-2.3 when using the various WJ III software scoring programs.

Friday, February 01, 2008

WJ III NU scoring issue explanation: Guest blog post by David Dailey

Recently a post was made to the CHC listserv asking for clarification regarding a particular score provided by the WJ III NU norms. I thought the question provided a "teachable moment" regarding certain psychometric principles and methods used in the WJ family of instruments.

I asked David Dailey, the resident statistician and technical consultant to the WJ author team, to write a brief explanation. His well written response is below. Enjoy.

[Interested readers may also be interested in a recently published WJ III NU Assessment Service Bulletin that explains why scores may differ between the WJ III and NU norms. Also, conflict of interest disclosure - I'm a coauthor of the WJ III]

Dear Ms. Jensen (person who posed the question):

Thank you for sharing the interesting profile of reading scores with the CHC mailing list. Kevin McGrew has asked me to write a few sentences about the phenomenon exhibited by these scores-- particularly, as you ask, why this 61-month-old child's Broad Reading score is "so low". I have been heavily involved in the development of the WJ III and WJ III NU norm tables, and I hope I will be able to shed some light on your question.

You reported that your subject earned a particular set of standard scores on the reading tests and clusters. I have augmented those scores with the approximate W, W-difference, and RPI scores that would also have appeared for that subject, in the following table (best viewed in a fixed-width font):

[I apologize for the formatting of the numbers below.....I tried hard to get a nice table format but was unable to get anything to work. I'm still a relatively newbie when it comes to using blogging software]

Test/Cluster, SS, W, W-diff, RPI

Letter-Word ID, 140 , 431, +87, 100

Word Attack, 138, 463, +81, 100

Reading Fluency, 128, 477, +13, 97

Passage Comprehension , 133, 458, +56, 100

Broad Reading, 125, 455, +52, 100

Brief Reading, 149, 444, +71, 100

Basic Reading Skills, 145, 448, +85, 100

You can verify for yourself that the cluster W scores are the arithmetic means of the W scores for the tests making up the cluster. The W-differences and the RPIs show that this subject's reading development is far above that of his/her age peers-- but they also show that the Reading Fluency score is not nearly as exceptional as the remaining scores.

You were concerned that the Broad Reading cluster standard score was so much lower than the other cluster standard scores. Although this subject's scores were exceptionally high for all the clusters (in terms of proficiency relative to age peers), the Broad Reading score is not as exceptional when compared to the other clusters. Its W-difference is lower than the other clusters because it include Reading Fluency, for which the subject outperformed age peers by "only" 13 W points.

The W-difference score in the table above is one of two terms that go into calculating a subject's standard score. The other is a scaling factor (SD - standard deviation) that accounts for how widely or narrowly spread the test scores were in the reference peer group.

In Woodcock-Johnson products, the scaling factor (SD) for subjects performing below the median for the peer reference group is permitted to be, and often is, different from the scaling factor (SD) for subjects performing above the median. So the WJ scoring model has always been able to reflect different amounts of spread among high performers than among low performers.

It turns out that, for young subjects such as yours, the scaling factor (SD) for high performers on the reading clusters is quite large-- meaning it takes a very large W-difference to earn a standard score that is far away from the mean. This is because, for most of these reading skills, the scores for the above-median subjects is very widely spread out. For Broad Reading, a 61-month-old subject must earn 32 W points more than the median to receive a standard score of 115 (one standard deviation above the mean). For the other two reading clusters, the number is somewhat smaller; that, coupled with the higher W-differences your subject earned on those clusters, accounts for the standard-score pattern for your subject.

(You might notice that the scaling factor for Reading Fluency is quite small. This reflects the fact that there is very little variation among above-median subjects at this age on this task.)

So the bottom line here is that the Broad Reading score suffers a "double whammy"-- a comparatively lower W-difference (due to the lower Reading Fluency) and a larger amount of above-median variation in the norming-sample scores. And this subject earned much higher standard scores on the other clusters because their relative performance (in terms of raw ability) on those clusters was much higher, plus the variation within the norming sample was smaller.

Thank you again for your question. I hope I have been able to help you understand more about how these scores work.

Technorati Tags: psychology, educational psychology, school psychology, neuropsychology, psychometrics, WJ III, WJ III NU, reading, CHC listser

Wednesday, October 10, 2007

Factor analysis of IQ tests: Too many factors?

This week, over on the NASP listserv, there has been a brief exchange re: the over-factoring of intelligence tests. It started when Dr. Gary Canivez referred to a recent article in the journal Intelligence (Frazier & Youngstrom, 2007). Dr. Canivez wrote:

You may also want to read the article by Frazier and Youngstrom (2007) on the issue of overfactoring in cognitive assessment instruments published in the journal Intelligence. Other issues related to how much variability is related to the highest order dimension (g) and what variability remains in lower order dimensions (second strata factors) also impacts interpretability of the scores.

I had previously commented ("Are contemporary IQ tests being overfactored") on this article (when "in press") at this blog, and won't repeat my thoughts here. The Frazier and Young abstract is below

A historical increase in the number of factors purportedly measured by commercial tests of cognitive ability may result from four distinct pressures including: increasingly complex models of intelligence, test publishers' desires to provide clinically useful assessment instruments with greater interpretive value, test publishers' desires to includeminor factors thatmay be of interest to researchers (but are not clinically useful), and liberal statistical criteria for determining the factor structure of tests. The present study examined the number of factors measured by several historically relevant and currently employed commercial tests of cognitive abilities using statistical criteria derived from principal components analyses, and exploratory and confirmatory factor analyses. Two infrequently used statistical criteria, that have been shown to accurately recover the number of factors in a data set, Horn's parallel analysis (HPA) and Minimum Average Partial (MAP) analysis, served as gold-standard criteria. As expected, there were significant increases over time in the number of factors purportedly measured by cognitive ability tests (r=.56, p=.030). Results also indicated significant recent increases in the overfactoring of cognitive ability tests. Developers of future cognitive assessment batteries may wish to increase the lengths of the batteries in order to more adequately measure additional factors. Alternatively, clinicians interested in briefer assessment strategies may benefit from short batteries that reliably assess general intellectual ability.

In an interesting response, Dr. Joel Schneider provided the following to chew on:

Gary, The paper you reference suggests that cognitive tests have "overfactored" their data, meaning they have extracted more factors than were really there in the data. It singles out the WJ-R and WJ-III as outliers (i.e., WJ batteries are REALLY overfactored). I found their conclusions hard to believe so I played with some simulated data using SPSS to see if I could make a simulated "WJ-III" dataset with 7 broad factor structures plus a g-factor uniting all the scores. Each subtest score was computed like this:

Subtest = g + BroadFactor + error

Each subtest was assigned to the broad factor it was designed to load
on. Each source of variance was normally distributed.

By systematically changing the variance of the g and broad factors, I
was able to look at how different factor extraction rules performed
under several combinations of g and broad factor sizes.

I found that the presence of even a moderately-sized g-factor caused all
of the factor extraction rules to underestimate the true number of
factors (7 correlated factors in this case).

It seems that under many plausible conditions WJ-III-like data will have
more factors than detected by popular factor extraction rules. Thus, I
think that this paper overstates its case.

Here is my SPSS syntax. Create a few thousand cases and then play with
the gCoefficient and FactorCoefficient variables (0 to 2 is a good range).

COMPUTE gCoefficient = 1.5 .
COMPUTE FactorCoefficient = 1.0.
COMPUTE g = RV.NORMAL(0,1) .
COMPUTE Gc = RV.NORMAL(0,1) .
COMPUTE Gf = RV.NORMAL(0,1) .
COMPUTE Gsm = RV.NORMAL(0,1) .
COMPUTE Gs = RV.NORMAL(0,1) .
COMPUTE Ga = RV.NORMAL(0,1) .
COMPUTE Glr = RV.NORMAL(0,1) .
COMPUTE Gv = RV.NORMAL(0,1) .
EXECUTE .
COMPUTE VC = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gc .
COMPUTE GI = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gc .
COMPUTE CF = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gf .
COMPUTE AS = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gf .
COMPUTE P = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gf .
COMPUTE VAL = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Glr .
COMPUTE RF = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Glr .
COMPUTE RPN = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Glr .
COMPUTE NR = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gsm .
COMPUTE MW = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gsm .
COMPUTE AWM = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gsm .
COMPUTE VM = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gs .
COMPUTE DS = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gs .
COMPUTE PC = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gs .
COMPUTE SB = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Ga .
COMPUTE AA = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Ga .
COMPUTE IW = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Ga .
COMPUTE SR = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gv .
COMPUTE PR = RV.NORMAL(0,1) + gCoefficient * g + FactorCoefficient * Gv .
EXECUTE .
FACTOR
/VARIABLES VC GI CF AS P VAL RF RPN NR MW AWM VM DS PC SB AA IW SR PR
/MISSING LISTWISE
/ANALYSIS VC GI CF AS P VAL RF RPN NR MW AWM VM DS PC SB AA IW SR PR
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT BLANK(.10)
/PLOT EIGEN
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PAF
/CRITERIA ITERATE(25)
/ROTATION PROMAX(4)
/METHOD=CORRELATION .

All of this "this factor analysis is better than that factor analysis" reminds me of a chapter written a long time ago by Doug Detterman (current and long-standing editor of the journal Intelligence---sorry...I can't recall the reference or the exact quotes...I'm going from long-term memory on this one) in some book on individual differences and intelligence that I read very early in my psychometric carrier. It was a chapter dealing with the laws of individual differences research. One of the laws had to do with factor analysis. In my own words---"if you put two factor analysis methodologists in the same room and an argument will break out....there will be no agreement on the number of factors to extract, the proper rotation method to use, and the interpretation of the factors."

So true! My only current comment is that having personally learned some of my most important lessons re: factor analysis from the likes of John Horn, John "Jack" Carroll, and Jack McArdle, there is as much "art" as there is "science" (specific factor extraction rules) to a proper factor analysis of intelligence tests.

Stay tunned. Dr. John Garruto has just sent me a practitioner perspective on this article. It will show up as a guest blog post later today or tomorrow.

Let the games begin.

Technorati Tags: psychology, psychometrics, educational psychology, school psychology, neuropsychology, intelligence, IQ, IQ tests, factor analysis, statistics, cognition, WJ-R, WJ III, John Horn, Jack Carroll, Jack McArdle, Doug Dettermen

Monday, September 03, 2007

WISC-III/IV scatter/FS IQ study - Hale responds to Lopez

Yesterday's guest blog post by Ruben Lopez generated a lengthy response by Dr. James "Brad" Hale, a regular voice on the NASP and CHC listservs on this particular topic (interpretation of composite and test scores on intelligence tests). To those who are not members of either of these lists, Brad has regularly challenged the statistical arguments/methods behind the research illustrated by the Watkins et al. article Ruben reviewed. In fact, in the same issue of Applied Neuropsychology, Hale et al outline their arguments (click here to view/download). [Note - see previous post on this blog about this entire special issue. There are a number of positions and arguments surrounding this entire topic..Brad's position is not the only position]

In an attempt to provide some balance, I decided to lift Brad's response from the NASP listserv and post it here at IQ's Corner. This will allow others to become aware of Brad's arguments and should help save bandwith on these two lists---as Brad (and others) can then simply refer people to this more permanent post at IQ's Corner for Brad's arguments and thoughts [Note to Brad - maybe this post can save you from having to repeatedly articulate your thoughts and ideas on the listservs...just insert a URL link to this post.]. Also, in the past other voices have been heard in response to Brad's arguments, challenges and claims. I would encourage any of those voices to contact me if they want to provide a counter-response (iap@earthlink.net).

Please note that my posting of Brad's response does not mean I endorse his arguments or claims. They are presented "as is." As blogmaster I would LOVE it if authors of the other articles in the special AN issue would respond with written responses I could post in the form of blog posts.

I mean no offense to Ruben Lopez, as he seems well-intentioned, but this is clear evidence of why the Watkins et al. results are so problematic, and could be seen as unethical. Why? Because practitioners such as Mr. Lopez read the positions and analyses of these authors, and conclude they "appear reasonable" (quote from Mr. Lopez). Then Mr. Lopez goes on to conclude "So, I won't disregard the full scale solely because of scatter". If the analyses are wrong as I suggest, this is clear evidence that at least one practitioner (and likely others) has been misinformed by this study, and this affects his (their?) practice of psychology.
This is very sad because in the rebuttal paper of this very same special issue (Hale et al., 2007), statistical analyses are provided that clearly show the errors in the Watkin's paper. Yes, errors. Those are strong words folks, yep. I wouldn't say they were errors unless I was convinced the data shows they are errors. It is all there in black and white folks. It isn't a matter of opinion, rather one of fact, and I challenge any statistician to go on record to say that the Watkins analysis is right, and our rebuttal analyses are wrong. *Please*, statisticians only! These are complex statistical arguments and it would be best if statisticians determine who is correct. Whether you believe in the value of global IQ or not, that is not the issue. The issue has solely to do with statistical analyses.
Again, please have any statistician who is willing to come forth and show/argue we are wrong, please do so. Please have them provide their full name and other identifying information so they may be contacted at a later date.
As for why people would continue to value and/or support papers that have significant statistical errors, or even be willing to publish them- that is up to the reader to determine. I won't even speculate because that could be seen as an ad hominem attack. I do find it interesting that someone could read one article in the special issue and say it is good, but not mention the other articles, which show it is not!
If I am right, and these analyses are statistically inappropriate, and they are used to inform practitioners about clinical practice (as are many of the other papers produced by this academic group - which we also show in the rebuttal paper), there is a serious ethical problem here. I have personally contacted the authors and directly informed them of the statistical errors in other papers, and I have also contacted editors.
We have also shown why the analyses are wrong in several published works. Yet, the works by this academic group continue to be endorsed by others, even those with the statistical sophistication to know better. It is a sad day in science when people's opinions and values superseed the facts. I guess we have to ask ourselves as a profession a very important question. Are we guided by scientific fact or fancy? It is up for all of us to decide.
Please do forward this email to anyone you think is willing to reply, including Drs.Watkins and Glutting.

Technorati Tags: psychology, educational psychology, school psychology, neuropsychology, cognitive, intelligence, IQ, IQ test, IQ score, WISC-III, WISC-IV

Friday, June 22, 2007

Quantoids corner - ROC curve classification

For my fellow quantoids.

A frequent statistical problem faced by us who do research in intelligence theory/testing is how to quantify the accuracy (sensitivity/specificity) of classification from a test score (or collection of test scores). Over the past years I've seen more-and-more published on the use of ROC curves (receiver operating characteristic curves) for evaluation classification accuracy. Typically the readings have been technical in nature. Just this week the Data Mining in MATLAB blog posted a GREAT "ROC for dummies" explanation. I loved it. It explains this procedure in very simple language. Take a peak if you are doing classification research and/or if you find yourself reading articles that use ROC methods.

Technorati Tags: pyschology, statistics, educational psychology, neuropsychology, ROC, ROC curves, classification accuracy, classification

Thursday, March 08, 2007

NCME March 2007 newsletter available

The March 2007 NCME (National Council on Measurement in Education) newsletter is now available on-line.

Technorati Tags: psychology, measurement, educational psychology, education, neuropsychology, psychometrics, NCME

powered by performancing firefox