Monday, March 02, 2009

Quantoids corner: Dealing with (and planning for) missing data in data gathering

It has been a long time since I've made a post that may tweak the cockles of the quantoids who read this blog. This is one for my fellow quants....and is also intended for those less quantitatively oriented---as the topic is one that will mentioned with greater regularity in research articles, test manuals, etc.

Missing data has been a problem that has plagued researchers and test developers for decades. Over the past 20 years very sophisticated methods of handling missing data and producing "complete" data sets via sophisticated statistical algorithms have become available. And....many individuals who have run data may have used these procedures and have been completely unaware that their analysis used imputed or plausible values! For example, if you use one of the primary structural equation modeling (SEM) software programs (e.g., LISREL; Mplus; AMOS), and you had incomplete data on some subjects, the programs most likely utilized one of these new algorithms to impute plausable values before running the SEM model.

I've been schooling myself on this literature for the past 15 years and have found these contemporary missing data imputation methods very useful. More and more researchers need to become aware of the benefits of these methods, as well as some of the nuances of using it correctly.

This past week I received a copy of the latest issue of the Annual Review of Psychology and found (to my pleasure) probably the most simple, conceputal, understandable summary of this area of statistics. I was not surprsied to see that it was written by John Graham, who has written many other important journal articles on this topic. I would urge the readers of IQs Corner who conduct applied research or test develpoment projects to read this overview article. It is well worth the read. Also, I would suggest that readers take a serious look at the NORM software of Schaefer...the program I use when serious data imputation is necessary. A nicely written description of the program, as well as a short and sweet overview of some of the missing data literature, is available in an article written by Darmawan (2004).

What is really cool is the concept of "planned missing data"-----that is, designing one's data collection project to deliberately have missing data in order to allow for the collection of more variables across a larger number of subjects....which can then be handled (if designed correctly) via these new quantoid toys.

Fellow (and future) quantoids...enjoy

Technorati Tags: , , , , , , , , , ,

No comments: