Performance of Error Estimators for Classification

  title={Performance of Error Estimators for Classification},
  author={Edward R. Dougherty and Chao Sima and Hua and Blaise Hanczar and Ulisses M. Braga-Neto},
  journal={Current Bioinformatics},
Classification in bioinformatics often suffers from small samples in conjunction with large numbers of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias, or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to… 

Figures from this paper

Combined classification error rate estimator for the Fisher linear classifier
A new combined classification error rate estimator designed specially for the Fisher linear classifier is proposed, which shows that resubstitution, leave-one-out, repeated10-fold cross-validation, repeated 2-foldcross- validation, basic bootstrap, 0.632 bootstraps, D-method, DS-method and M-method are outperformed by the proposed combined error rates estimator (in terms of root-mean-square error).
A bias-variance trade-off in the prediction error estimation behavior in bootstrap methods for microarray leukemia classification
The bias and variance of the prediction error rates have high variability in various bootstrap methods and other resampling methods maybe are useful for the microarray classification in the different situations.
The results obtained show that geometric bolstered error estimation algorithms are very fast error estimation techniques characterized by accuracy comparable with LOO and having lower variance, which may find their applications in wide fields of - omics data analysis.
Model selection for linear classifiers using Bayesian error estimation
Approximation of unbiased convex classification error rate estimator
Experiments with real world and synthetic data sets show that common error estimation methods, such as resubstitution, repeated 10-foldcross-validation, leave-one-out and random subsampling are outperformed (in terms of root-mean-square error) by the proposed method.
Weighted Classification Error Rate Estimator for the Euclidean Distance Classifier
Experiments with real world and synthetic data sets show that resubstitution, repeated 2-fold cross-validation, leave-one-out, basic bootstrap and D-method are outperformed by the proposed weighted error rate estimator (in terms of root-mean-square error).
Which Resampling-Based Error Estimator for Benchmark Studies? A Power Analysis with Application to PLS-LDA
Methodological issues related to comparison studies of prediction methods which involve several real data sets and use resampling-based error estimators as the evaluation criteria are considered.
Application of the Bayesian MMSE estimator for classification error to gene expression microarray data
The calibrated Bayesian error estimator has superior root mean square performance, especially with moderate to high expected true errors and small feature sizes, and is implemented in C code for non-linear classification.
A New Measure of Classifier Performance for Gene Expression Data
This work proposes a new measure of classifier performance that takes account of the uncertainty of the error and shows that the performance of classifiers is very depending of the ratio of the classification costs.


Is cross-validation valid for small-sample microarray classification?
An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules-linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)-using both synthetic and real breast-cancer patient data.
Prediction error estimation: a comparison of resampling methods
This work compares several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection, and finds that LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis and the .632+ bootstrap has the lowest mean square error.
Superior feature-set ranking for small samples using bolstered error estimation
The results indicate that bolstering is superior to bootstrap, and bootstrap is better than cross-validation, for discovering top-performing feature sets for classification when using small samples.
Decorrelation of the True and Estimated Classifier Errors in High-Dimensional Settings
The effect of correlation on error precision is demonstrated via a decomposition of the variance of the deviation distribution, and it is observed that the correlation is often severely decreased in high-dimensional settings, and that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on thevariance of the estimated error.
Small Sample Issues for Microarray-Based Classification
  • E. Dougherty
  • Computer Science
    Comparative and functional genomics
  • 2001
Fundamental issues facing small-sample classification: classification rules, constrained classifiers, error estimation and feature selection, and the impact of small samples on the ability to include more than a few variables as classifier features is explained.
Optimal convex error estimators for classification
Bolstered error estimation