Showing posts with label reliability. Show all posts
Showing posts with label reliability. Show all posts
Wednesday, June 25, 2014
Wednesday, May 18, 2011
Research bytes: Reliability paradox in SEM models and causal v effect indicator models

For the quantoid readers of IQs Corner. Italics emphasis added by the blog dictator.
Hancock, G. R., & Mueller, R. O. (2011). The Reliability Paradox in Assessing Structural Relations Within Covariance Structure Models. Educational and Psychological Measurement, 71(2), 306-324.
A two-step process is commonly used to evaluate data–model fit of latent variable path models, the first step addressing the measurement portion of the model and the second addressing the structural portion of the model. Unfortunately, even if the fit of the measurement portion of the model is perfect, the ability to assess the fit within the structural portion is affected by the quality of the factor–variable relations within the measurement model. The result is that models with poorer quality measurement appear to have better data–model fit, whereas models with better quality measurement appear to have worse data–model fit. The current article illustrates this phenomenon across different classes of fit indices, discusses related structural assessment problems resulting from issues of measurement quality, and endorses a supplemental modeling step evaluating the structural portion of the model in isolation from the measurement model.
Hardin, A. M., Chang, J. C. J., Fuller, M. A., & Torkzadeh, G. (2011). Formative Measurement and Academic Research: In Search of Measurement Theory. Educational and Psychological Measurement, 71(2), 281-305
The use of causal indicators to formatively measure latent constructs appears to be on the rise, despite what appears to be a troubling lack of consistency in their application. Scholars in any discipline are responsible not only for advancing theoretical knowledge in their domain of study but also for addressing methodological issues that threaten that advance. In that spirit, the current study traces causal indicators from their origins in causal modeling to their use in structural equation modeling today. Conclusions from this review suggest that unlike effect (reflective) indicators, whose application is based on classical test theory, today’s application of causal (formative) indicators is based on research demonstrating their practical application rather than on psychometric theory supporting their use. The authors suggest that this lack of theory has contributed to the confusion surrounding their implementation. Recent research has questioned the generalizability of formatively measured latent constructs. In the current study, the authors discuss how the use of fixed-weight composites may be one way to employ causal indicators so that they may be generalized to additional contexts. More specifically, they suggest the use of meta-analysis principles for identifying optimum causal indicator weights that can be used to generate fixed-weight composites. Finally, the authors explain how these fixed-weight composites can be implemented in both components-based and covariance-based statistical packages. Implications for the use of causal indicators in academic research are used to focus these discussions.
- iPost using BlogPress from my Kevin McGrew's iPad
Sunday, March 27, 2011
IAP Applied Psychometrics 101 Report #10: "Just say no" to averaging IQ subtest scores
Should psychologists engage in the practice of calculating simple arithmetic averages of two or more scaled or standard scores from different subtests (pseudo-composites) within or across different IQ batteries? Dr. Joel Schneider and I, Dr. Kevin McGrew say "no."
Do psychologists who include simple pseudo-composite scores in their reports, or make interpretations and recommendations based on such scores, have a professional responsibility to alert recipients of psychological reports (e.g., lawyers, the courts, parents, special education staff, other mental health practitioners, etc.) of the potential amount of error in their statements when simple pseudo-composite scores are the foundation of some of their statements? We believe "yes."
Simple pseudo-composite scores, in contrast to norm-based scores (i.e., composite scores with norms provided by test publishers/authors--e.g., Wechsler Verbal Comprehension Index), contain significant sources of error. Although they have intuitive appeal, this appeal cloaks hidden sources of error in the scores---with the amount of error being a function of a combination of psychometric variables.
IAP Applied Psychometrics 101 Report #10 addresses the psychometric issues involved in pseudo-composite scores.
In the report we offer recommendations and resources that allow users to calculate psychometrically sound pseudo-composites when they are deemed important and relevant to the interpretation of a person's assessment results.
Finally, understanding the sources of error in simple pseudo-composite scores provides an opportunity for practitioners to understand the paradoxical phenomenon frequently observed in practice where norm-based or psychometrically sound pseudo-composite scores are often higher (or lower) than the subtest scores that comprise the composite. The "total does not equal the average of the parts" phenomenon is explained conceptually, statistically, and via an interesting visual explanation based on trigonometry.

Abstract
The publishers and authors of intelligence test batteries provide norm-based composite scores based on two or more individual subtests. In practice, clinicians frequently form hypotheses based on combinations of tests for which norm-based composite scores are not available. In addition, with the emergence of Cattell-Horn-Carroll (CHC) theory as the consensus psychometric theory of intelligence, clinicians are now more frequently “crossing batteries” to form composites intended to represent broad or narrow CHC abilities. Beyond simple “eye-balling” of groups of subtests, clinicians at times compute the arithmetic average of subtest scaled or standard scores (pseudo-composites). This practice suffers from serious psychometric flaws and can lead to incorrect diagnoses and decisions. The problems with pseudo-composite scores are explained and recommendations made for the proper calculation of special composite scores.
- iPost using BlogPress from my Kevin McGrew's iPad
intelligence IQ tests IQ testing IQ scores CHC intelligence theory CHC theory Cattell-Horn-Carroll human cognitive abilities psychology school psychology individual differences cognitive psychology neuropsychology psychology special education educational psychology psychometrics psychological assessment psychological measurement IQs Corner general intelligence standard scores IQ subtests Wechsler IQ subtests IQ part scores IQ composite scores cross-battery assessment applied Psychometrics psychological measurement
Do psychologists who include simple pseudo-composite scores in their reports, or make interpretations and recommendations based on such scores, have a professional responsibility to alert recipients of psychological reports (e.g., lawyers, the courts, parents, special education staff, other mental health practitioners, etc.) of the potential amount of error in their statements when simple pseudo-composite scores are the foundation of some of their statements? We believe "yes."
Simple pseudo-composite scores, in contrast to norm-based scores (i.e., composite scores with norms provided by test publishers/authors--e.g., Wechsler Verbal Comprehension Index), contain significant sources of error. Although they have intuitive appeal, this appeal cloaks hidden sources of error in the scores---with the amount of error being a function of a combination of psychometric variables.
IAP Applied Psychometrics 101 Report #10 addresses the psychometric issues involved in pseudo-composite scores.
In the report we offer recommendations and resources that allow users to calculate psychometrically sound pseudo-composites when they are deemed important and relevant to the interpretation of a person's assessment results.
Finally, understanding the sources of error in simple pseudo-composite scores provides an opportunity for practitioners to understand the paradoxical phenomenon frequently observed in practice where norm-based or psychometrically sound pseudo-composite scores are often higher (or lower) than the subtest scores that comprise the composite. The "total does not equal the average of the parts" phenomenon is explained conceptually, statistically, and via an interesting visual explanation based on trigonometry.

Abstract
The publishers and authors of intelligence test batteries provide norm-based composite scores based on two or more individual subtests. In practice, clinicians frequently form hypotheses based on combinations of tests for which norm-based composite scores are not available. In addition, with the emergence of Cattell-Horn-Carroll (CHC) theory as the consensus psychometric theory of intelligence, clinicians are now more frequently “crossing batteries” to form composites intended to represent broad or narrow CHC abilities. Beyond simple “eye-balling” of groups of subtests, clinicians at times compute the arithmetic average of subtest scaled or standard scores (pseudo-composites). This practice suffers from serious psychometric flaws and can lead to incorrect diagnoses and decisions. The problems with pseudo-composite scores are explained and recommendations made for the proper calculation of special composite scores.
- iPost using BlogPress from my Kevin McGrew's iPad
intelligence IQ tests IQ testing IQ scores CHC intelligence theory CHC theory Cattell-Horn-Carroll human cognitive abilities psychology school psychology individual differences cognitive psychology neuropsychology psychology special education educational psychology psychometrics psychological assessment psychological measurement IQs Corner general intelligence standard scores IQ subtests Wechsler IQ subtests IQ part scores IQ composite scores cross-battery assessment applied Psychometrics psychological measurement
Generated by: Tag Generator
Saturday, January 01, 2011
Dr. Doug Detterman's bytes: Psychometric reliability

Another of Dr. Doug Detterman's intelligence bytes.
Reliability is consistency. A measure is reliable if it provides the same measurement on repeated applications. A measurement is an attempt to estimate the value of a true score or latent trait. If it were possible to measure this true score or latent trait value exactly, the measurement would provide the same value on each measurement occasion so long as the trait remains unchanged. However, measurement is never perfect. There will always be some error. To understand the accuracy of any measure requires knowing the amount of error in the measurement.
One of the reasons so many important relationships have been found with intelligence is that they are highly reliable.
All good science begins with reliable measurement. As Pavlov put it, control
your conditions, and you will see order. This is why reliability is so important and
probably deserves even more attention than it was given here.
- iPost using BlogPress from my Kevin McGrew's iPad
intelligence IQ tests IQ scores CHC theory Cattell-Horn-Carroll human cognitive abilities psychology school psychology individual differences cognitive psychology neuropsychology special education educational psychology psychometrics psychological assessment psychological measurement IQs Corner neuroscience neurocognitive cognitive abilities cognition Reliability Detterman's bytes
Friday, October 23, 2009
Intelligence test "practice effects": New review overview article
New overview article of intellectual testing "practice effects" over at sister blog--ICDP
Technorati Tags: psychology, forensic psychology, criminal psychology, criminal justice, educational psychology, school psychology, neuropsychology, Atkins case, MR, mental retardation, death penalty, capital punishment, IQ tests, IQ scores, IQ, intelligence tests, measurement, intelligence, practice effects, Wechsler batteries, WAIS, WAIS-R, WAIS-III, Stanford-Binet, SB5, psychometrics
Technorati Tags: psychology, forensic psychology, criminal psychology, criminal justice, educational psychology, school psychology, neuropsychology, Atkins case, MR, mental retardation, death penalty, capital punishment, IQ tests, IQ scores, IQ, intelligence tests, measurement, intelligence, practice effects, Wechsler batteries, WAIS, WAIS-R, WAIS-III, Stanford-Binet, SB5, psychometrics
Wednesday, July 08, 2009
Applied Psych Test Design Part G: Psychometric/technical statistical analysis: External
The seventh in the series Art and Science of Applied Test Development is now available.
The seventh module (Part G: Psychometric/technical statistical analysis: External) is now posted and is accessible via SlideShare.
In addition, I've made some new edits and additions to prior presentations (Part A-F)....so if you've viewed the prior modules you may want to revisit them again.
This is the seventh in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
This project can be tracked on the left-side pane of the blog under the heading of Applied Test Development Test Development Series.
The first module (Part A: Planning, development frameworks & domain/test specification blueprints) was posted previously and is accessible via SlideShare.
The second module (Part B: Test and item development) was posted previously and is accessible via SlideShare.
The third module (Part C--Use of Rasch scaling technology) was posted previously and is accessible via Slideshare.
The fourth module (Part D--Develop norm [standardization] plan) was posted previously and is accessible via Slideshare.
The fifth module (Part E--Calcuate norms and derived scores) was posted previously and is accessible via Slideshare.
The sixth module (Part F--Psychometric/technical statistical analysis: Internal) was posted previously and is accessible via Slideshare.
You are STRONGLY encouraged to view them in order as concepts, graphic representation of concepts and ideas, etc., build on each other from start to finish.
That's it for now. I will likely be revising and adding more material in the future---but this is the "basic" set of materials for now.
Technorati Tags: psychology, education, educational psychology, school psychology, neuropsychology, cognition, intelligence, ISIR, IQ, IQ tests, test development, IRT, Rasch, norms, standardiation, psychometrics, measurement, scaling
The seventh module (Part G: Psychometric/technical statistical analysis: External) is now posted and is accessible via SlideShare.
In addition, I've made some new edits and additions to prior presentations (Part A-F)....so if you've viewed the prior modules you may want to revisit them again.
This is the seventh in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
This project can be tracked on the left-side pane of the blog under the heading of Applied Test Development Test Development Series.
The first module (Part A: Planning, development frameworks & domain/test specification blueprints) was posted previously and is accessible via SlideShare.
The second module (Part B: Test and item development) was posted previously and is accessible via SlideShare.
The third module (Part C--Use of Rasch scaling technology) was posted previously and is accessible via Slideshare.
The fourth module (Part D--Develop norm [standardization] plan) was posted previously and is accessible via Slideshare.
The fifth module (Part E--Calcuate norms and derived scores) was posted previously and is accessible via Slideshare.
The sixth module (Part F--Psychometric/technical statistical analysis: Internal) was posted previously and is accessible via Slideshare.
You are STRONGLY encouraged to view them in order as concepts, graphic representation of concepts and ideas, etc., build on each other from start to finish.
That's it for now. I will likely be revising and adding more material in the future---but this is the "basic" set of materials for now.
Technorati Tags: psychology, education, educational psychology, school psychology, neuropsychology, cognition, intelligence, ISIR, IQ, IQ tests, test development, IRT, Rasch, norms, standardiation, psychometrics, measurement, scaling
Tuesday, July 07, 2009
Applied Psych Test Development Series: Parts F--Psychometric/technical statistical analysis: Internal
The sixth in the series Art and Science of Applied Test Development is now available.
The sixth module (Part F--Psychometric/technical statistical analysis: Internal) is now available.
In addition, I've made some edits and additions (esp. summary "Tools, Tips, and Troubles" and "Advanced Topics" slides) to prior presentations (Part A-E).
This is the sixth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
This project can be tracked on the left-side pane of the blog under the heading of Applied Test Development Test Development Series.
The first module (Part A: Planning, development frameworks & domain/test specification blueprints) was posted previously and is accessible via SlideShare.
The second module (Part B: Test and item development) was posted previously and is accessible via SlideShare.
The third module (Part C--Use of Rasch scaling technology) was posted previously and is accessible via Slideshare.
The fourth module (Part D--Develop norm [standardization] plan) was posted previously and is accessible via Slideshare.
The fifth module (Part E--Calcuate norms and derived scores) was posted previously and is accessible via Slideshare.
You are STRONGLY encouraged to view them in order as concepts, graphic representation of concepts and ideas, etc., build on each other from start to finish.
Enjoy...more to come.
Technorati Tags: psychology, education, educational psychology, school psychology, neuropsychology, cognition, intelligence, ISIR, IQ, IQ tests, test development, IRT, Rasch, norms, standardiation, psychometrics, measurement, scaling
The sixth module (Part F--Psychometric/technical statistical analysis: Internal) is now available.
In addition, I've made some edits and additions (esp. summary "Tools, Tips, and Troubles" and "Advanced Topics" slides) to prior presentations (Part A-E).
This is the sixth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
This project can be tracked on the left-side pane of the blog under the heading of Applied Test Development Test Development Series.
The first module (Part A: Planning, development frameworks & domain/test specification blueprints) was posted previously and is accessible via SlideShare.
The second module (Part B: Test and item development) was posted previously and is accessible via SlideShare.
The third module (Part C--Use of Rasch scaling technology) was posted previously and is accessible via Slideshare.
The fourth module (Part D--Develop norm [standardization] plan) was posted previously and is accessible via Slideshare.
The fifth module (Part E--Calcuate norms and derived scores) was posted previously and is accessible via Slideshare.
You are STRONGLY encouraged to view them in order as concepts, graphic representation of concepts and ideas, etc., build on each other from start to finish.
Enjoy...more to come.
Technorati Tags: psychology, education, educational psychology, school psychology, neuropsychology, cognition, intelligence, ISIR, IQ, IQ tests, test development, IRT, Rasch, norms, standardiation, psychometrics, measurement, scaling
Friday, October 24, 2008
WJ test reliability over time
"A test does not change from one time to another: people do. There may be considerable change on some traits, but relatively little on others. Test-retest studies evaluate the tendency for change in people, not some aspect of test quality. A test that does not reflect such changes in human traits would be an insensitive measure of those traits" (McGrew, Werder, & Woodcock, 199, p. 99).
Over on the NASP Listserv Dr. Gary Canivez asked the following question, in response to a post regarding changes in scores on the K-ABC and WJ---"Does anyone have references for long-term stability of WJ or KABC-2 scores? I'd be interested in references for such studies."
There was a very sophisticated test-retest study reported in the WJ-R Technical Manual (McGrew, Werder & Woodcock, 1991) (click here to view/download). Unfortunately it is in a test technical manual...a document that is too often ignored once a test is purchased. Additonal information can be found in the following article.
And then, of course, a person's state (concentration, anxiety, fatigue, etc.) at any testing moment can impact test performance...and test's sensitive to these states (e.g., Gsm, Gs) will likely reflect these temporary state fluctuations. This reflects state variance...which is NOT a problem with the measure...the measure is accurately reflecting how the person is doing at that time. School psychologists (and others who do psychological testing), unfortunately, typically receive measurement training that only talks about simple test-retest reliability studies and results....a diservice to our profession. We need to understand our instruments better. Properly designed test-retest studies, where the retest interval is varied, can help identify and portion the difference sources of test score variance to the appropriate sources of stable or unstable change score variance.
Read the WJ-R technical manual excerpt in particular. It is the most readable and understandable of the two sources I've included in this post.
Technorati Tags: psychology, school psychology, educational psychology, neuropsychology, psychometrics, reliability, test-retest, measurement, WJ-R, WJ III, K-ABC, KABC-II, Woodcock Johnson
Over on the NASP Listserv Dr. Gary Canivez asked the following question, in response to a post regarding changes in scores on the K-ABC and WJ---"Does anyone have references for long-term stability of WJ or KABC-2 scores? I'd be interested in references for such studies."
There was a very sophisticated test-retest study reported in the WJ-R Technical Manual (McGrew, Werder & Woodcock, 1991) (click here to view/download). Unfortunately it is in a test technical manual...a document that is too often ignored once a test is purchased. Additonal information can be found in the following article.
- McArdle, J. J., FerrerCaja, E., Hamagami, F., & Woodcock, R. W. (2002). Comparative longitudinal structural analyses of the growth and decline of multiple intellectual abilities over the life span. Developmental Psychology, 38(1), 115-142. (click to view)
And then, of course, a person's state (concentration, anxiety, fatigue, etc.) at any testing moment can impact test performance...and test's sensitive to these states (e.g., Gsm, Gs) will likely reflect these temporary state fluctuations. This reflects state variance...which is NOT a problem with the measure...the measure is accurately reflecting how the person is doing at that time. School psychologists (and others who do psychological testing), unfortunately, typically receive measurement training that only talks about simple test-retest reliability studies and results....a diservice to our profession. We need to understand our instruments better. Properly designed test-retest studies, where the retest interval is varied, can help identify and portion the difference sources of test score variance to the appropriate sources of stable or unstable change score variance.
Read the WJ-R technical manual excerpt in particular. It is the most readable and understandable of the two sources I've included in this post.
Technorati Tags: psychology, school psychology, educational psychology, neuropsychology, psychometrics, reliability, test-retest, measurement, WJ-R, WJ III, K-ABC, KABC-II, Woodcock Johnson
Labels:
EdPsych,
IQ scores,
IQ tests,
psychometrics,
reliability,
WJ III,
WJ-R
Subscribe to:
Posts (Atom)