# What Size Test Set Gives Good Error Rate Estimates?

@article{Guyon1998WhatST, title={What Size Test Set Gives Good Error Rate Estimates?}, author={Isabelle Guyon and John Makhoul and Richard M. Schwartz and Vladimir Naumovich Vapnik}, journal={IEEE Trans. Pattern Anal. Mach. Intell.}, year={1998}, volume={20}, pages={52-64} }

We address the problem of determining what size test set guarantees statistically significant results in a character recognition task, as a function of the expected error rate. We provide a statistical analysis showing that if, for example, the expected character error rate is around 1 percent, then, with a test set of at least 10,000 statistically independent handwritten characters (which could be obtained by taking 100 characters from each of 100 different writers), we guarantee, with 95…

## Figures from this paper

## 162 Citations

A study on model-based error rate estimation for automatic speech recognition

- Computer ScienceIEEE Trans. Speech Audio Process.
- 2003

A one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models and it is demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.

DATA SETS FOR OCR AND DOCUMENT IMAGE UNDERSTANDING RESEARCH

- Computer Science
- 1997

Several significant sets of labeled samples of image data are surveyed that can be used in the development of algorithms for offline and online handwriting recognition as well as for machine printed…

Learning small gallery size for prediction of recognition performance on large populations

- Computer SciencePattern Recognit.
- 2013

How Might We Create Better Benchmarks for Speech Recognition?

- Computer ScienceBPPF
- 2021

A versatile framework designed to describe interactions between linguistic variation and ASR performance metrics is introduced, and a taxonomy of speech recognition use cases is outlined, proposed for the next generation of ASR benchmarks.

Optimality of training/test size and resampling effectiveness in cross-validation

- MathematicsJournal of Statistical Planning and Inference
- 2019

Test Set Sizing Via Random Matrix Theory

- Computer ScienceArXiv
- 2021

This paper uses techniques from Random Matrix Theory to solve for the training and test set sizes for any model in a way that is truly optimal, and is a step towards automatic choices of training/test set sizes in machine learning.

Combining Structure and Parameter Adaptation of HMMs for Printed Text Recognition

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2014

These algorithms are semi-supervised: to adapt a given HMM model on new data, they require little labeled data for parameter adaptation and a moderate amount of unlabeled data to estimate the criteria used for HMM structure optimization.

Optimal policy for labeling training samples

- Computer ScienceElectronic Imaging
- 2013

The overall cost of human labeling can be decreased by interspersing labeling and classification, given a parameterized model of the error rate as an inverse power law function of the size of the training set.

The First-Degree Stochastic Dominance Rule of Comparing Pattern Recognition Algorithms with Accuracy Metric

- Computer Science2022 IEEE 5th International Conference on Electronics Technology (ICET)
- 2022

The first-degree dominance (FSD) rule is checked and proved to be unaffected by any non-diminishing utility transformation and gives much flexibility in preparing test data and more confidence about the comparison results.

Estimation and sample size calculations for matching performance of biometric authentication 1

- Computer Science
- 2005

This paper presents two methodologies for creating confidence intervals for matching error rates, based on a general parametric model, that are able to ’invert’ their confidence intervals to develop appropriate sample size calculations that account for both number of attempts per person and number of individuals to be tested.

## References

SHOWING 1-10 OF 11 REFERENCES

UNIPEN project of on-line data exchange and recognizer benchmarks

- Computer ScienceProceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5)
- 1994

The status of the UNIPEN project of data exchange and recognizer benchmarks started two years ago is reported, to propose and implement solutions to the growing need of handwriting samples for online handwriting recognizers used by pen-based computers.

Local Learning Algorithms

- Computer ScienceNeural Computation
- 1992

A single analysis suggests that neither kNN or RBF, nor nonlocal classifiers, achieve the best compromise between locality and capacity.

Probability inequalities for sum of bounded random variables

- Mathematics
- 1961

Abstract Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S…

An Introduction to the Theory of Statistics

- MathematicsNature
- 1911

1 probability 2 Random variables, distribution functions, and expectation 3 Special parametric families of univariate distributions 4 Joint and conditional distributions, stochastic independence,…

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

- Mathematics
- 1952

Writer independent and writer adaptive neural network for on-line character recognition

- Computer Science
- 1992

Some statistical issues in the comparison of speech recognition algorithms

- PhysicsInternational Conference on Acoustics, Speech, and Signal Processing,
- 1989

The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant. The first (McNemar's test)…