Monday, May 01, 2006

Comparison of statistical methods of group classificaiton - guest post by Noel Gregg


The following is a guest post by Dr. Noel Gregg, Distinguished Research Professor at the University of Georgia, and member of the IQs Corner Virtual Community of Scholars. Noel reviewed the following important methodological article and has provided her comments below. I want to thank Noel for selecting an article that may be of greater interest to the quantoids who are regular IQs Corner readers.
  • Finch, W. & Schneider, M. (2006). Misclassification Rates for Four Methods of Group Classification: Impact of Predictor Distribution, Covariance Inequality, Effect Size, Sample Size, and Group Size Ratio. Educational and Psychological Measurement,66 (2) 240-257 (click to view)
Abstract
  • This study compares the classification accuracy of linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression (LR), and classification and regression trees (CART) under a variety of data conditions. Past research has generally found comparable performance of LDA and LR, with relatively less research on QDA and virtually none onCART. This study uses Monte Carlo simulations to assess the crossvalidated predictive accuracy of these methods, while manipulating such factors as predictor distribution, sample size, covariance matrix inequality, group separation, and group size ratio. The results indicate thatQDAperforms as well as or better than the other alternatives in virtually all conditions. Suggestions for practitioners are provided.

Finch and Schneider (2006) present an excellent study in which they investigated the statistical accuracy of four methods of group classification systems. Knowledge of the strengths and weaknesses of statistical methods used for predicting group membership is extremely important to professionals. These methods are used in a variety of contexts such as determining admissions into treatment or academic programs, identifying individuals at-risk for academic or behavioral failure, and investigation of instruments most predictive for classification decision-making. The systems they investigated under a variety of conditions included: linear discriminant analysis (LDA); quadratic discriminant analysis (QDA), logistic regression (LR), and classification and regression trees (CART). The study is based on Monte Carlo simulations.

Descriptions for each of these systems are provided in the article. The performance of the four procedures was assessed under a variety of conditions by manipulating several variables. Some of the variables manipulated included sample size, distribution of the predictor variables, level of covariance matrix heterogeneity between groups, proportion of cases in each group, and effect size separating the groups. All simulations and analyses were conducted using the R software. Table 6 in the article provides an excellent review for professionals of the misclassification rates by sample size and sample size ratio generated through the analyses in this study.

The following conclusions about the performance of the four methods are listed below.
  • 1. QDA had a misclassification rate that was never larger than that of LDA and LR.
  • 2. If the assumptions of LDA were met (i.e., data normally distributed; covariance matrices equal across groups), the three approaches had comparable misclassification rates.
  • 3. CART had error rates higher than the other three methods.
  • 4. The error rates of LDA and LR became inflated when the assumption of the covariance matrices was not met.
  • 5. QDA had a lower misclassification rate than when the covariance was equal for the groups, and this error was lower than that of LDA and LR.
  • 6. CART's misclassification rate declines when the covariance matrices were not equal, and it was lower than that of LDA and LR.
  • 7. The four methods performed similarly for the all-categorical predictor case.
  • 8. The four methods had their lowest error rates for normally distributed predictors, slightly higher rates for skewed distributed, and the highest for mixed distribution.
  • 9. When the covariance matrices were equal, regardless of the distribution of the predictors, the misclassification rates of the LDA, LR, and QDA were comparable.
  • 10. The size of the sample did not have an impact on the misclassification rates of all four procedures when the distribution of the predictor variables was mixed categorical and continuous.
  • 11. When the predictors were normally distributed, the misclassification rates of LDA and LR were slightly inflated when the covariance was not equal.
  • 12. QDA and CART had lower error rates in all cases where the two groups? covariance matrices were not equal.
  • 13. Effect size reflecting level of group separation was not important in understanding the misclassification rates of the four procedures.
  • 14. The most important factor was the proportion of individuals in each group.
Recommendations for Professionals.
  • a. QDA performs as well as or better than LDA, LR, and CART in virtually all conditions.
  • b. QDA produced cross-validated misclassification results the best, particularly when the groups do not have equal covariance matrices.
  • c. When the groups are extremely unequal in size, the misclassification rate for the smaller one will be very high regardless of the method. This calls for the need for oversampling.
  • d. Inequality of covariance matrices for the predictors between groups has a greater impact on the misclassification rates of all four methods than does the predictors? distribution. The more unequal the covariances, the better QDA and CART performance and the higher error rate for LDA and LR.
Technorati Tags: , , , , , , , , ,

1 comment:

Will Dwinnell said...

The cited paper is interesting. It is worth noting the following:

1. The specific performance measure used is accuracy. It would have been interesting to see a performance measure which takes into account the estimated probabilities as well as the predicted class, such as area under the ROC curve ("AUC").

2. The data sizes tested ranged from 60 to 300 observations. Many data sets are much larger.

3. The data tested was artificially generated. While this gave the authors control over things like the covariance of the predictors, it would be interesting to perform a similar study on real data sets.

4. I believe that all predictors available were used, hence no feature selection was employed. Feature selection strategies are important, and one of the systems tested, CART, includes this as an implicit component of its processing.

-Will Dwinnell
Data Mining in MATLAB