What Size Test Set Gives Good Error Rate Estimates?

  title={What Size Test Set Gives Good Error Rate Estimates?},
  author={Isabelle Guyon and John Makhoul and Richard M. Schwartz and Vladimir Naumovich Vapnik},
  journal={IEEE Trans. Pattern Anal. Mach. Intell.},
We address the problem of determining what size test set guarantees statistically significant results in a character recognition task, as a function of the expected error rate. We provide a statistical analysis showing that if, for example, the expected character error rate is around 1 percent, then, with a test set of at least 10,000 statistically independent handwritten characters (which could be obtained by taking 100 characters from each of 100 different writers), we guarantee, with 95… 

Figures from this paper

A study on model-based error rate estimation for automatic speech recognition
A one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models and it is demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.
Several significant sets of labeled samples of image data are surveyed that can be used in the development of algorithms for offline and online handwriting recognition as well as for machine printed
How Might We Create Better Benchmarks for Speech Recognition?
A versatile framework designed to describe interactions between linguistic variation and ASR performance metrics is introduced, and a taxonomy of speech recognition use cases is outlined, proposed for the next generation of ASR benchmarks.
Optimality of training/test size and resampling effectiveness in cross-validation
Test Set Sizing Via Random Matrix Theory
This paper uses techniques from Random Matrix Theory to solve for the training and test set sizes for any model in a way that is truly optimal, and is a step towards automatic choices of training/test set sizes in machine learning.
Combining Structure and Parameter Adaptation of HMMs for Printed Text Recognition
These algorithms are semi-supervised: to adapt a given HMM model on new data, they require little labeled data for parameter adaptation and a moderate amount of unlabeled data to estimate the criteria used for HMM structure optimization.
Optimal policy for labeling training samples
The overall cost of human labeling can be decreased by interspersing labeling and classification, given a parameterized model of the error rate as an inverse power law function of the size of the training set.
The First-Degree Stochastic Dominance Rule of Comparing Pattern Recognition Algorithms with Accuracy Metric
  • Jun He, Rui-Gang Fu
  • Computer Science
    2022 IEEE 5th International Conference on Electronics Technology (ICET)
  • 2022
The first-degree dominance (FSD) rule is checked and proved to be unaffected by any non-diminishing utility transformation and gives much flexibility in preparing test data and more confidence about the comparison results.
Estimation and sample size calculations for matching performance of biometric authentication 1
This paper presents two methodologies for creating confidence intervals for matching error rates, based on a general parametric model, that are able to ’invert’ their confidence intervals to develop appropriate sample size calculations that account for both number of attempts per person and number of individuals to be tested.


UNIPEN project of on-line data exchange and recognizer benchmarks
The status of the UNIPEN project of data exchange and recognizer benchmarks started two years ago is reported, to propose and implement solutions to the growing need of handwriting samples for online handwriting recognizers used by pen-based computers.
Local Learning Algorithms
A single analysis suggests that neither kNN or RBF, nor nonlocal classifiers, achieve the best compromise between locality and capacity.
Probability inequalities for sum of bounded random variables
Abstract Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S
An Introduction to the Theory of Statistics
1 probability 2 Random variables, distribution functions, and expectation 3 Special parametric families of univariate distributions 4 Joint and conditional distributions, stochastic independence,
Some statistical issues in the comparison of speech recognition algorithms
  • L. Gillick, S. Cox
  • Physics
    International Conference on Acoustics, Speech, and Signal Processing,
  • 1989
The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant. The first (McNemar's test)