Corpus ID: 232069109

Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data

@article{Arora2021RipVW,
  title={Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data},
  author={Sanjeev Arora and Y. Zhang},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.13189}
}
Traditional statistics forbids use of test data (a.k.a. holdout data) during training. Dwork et al. 2015 pointed out that current practices in machine learning, whereby researchers build upon each other’s models, copying hyperparameters and even computer code—amounts to implicitly training on the test set. Thus error rate on test data may not reflect the true population error. This observation initiated adaptive data analysis, which provides evaluation mechanisms with guaranteed upper bounds on… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 30 REFERENCES
Algorithmic stability for adaptive data analysis
Preserving Statistical Validity in Adaptive Data Analysis
The advantages of multiple classes for reducing overfitting from test set reuse
Model Similarity Mitigates Test Set Overuse
Preventing False Discovery in Interactive Data Analysis Is Hard
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
A Meta-Analysis of Overfitting in Machine Learning
Differential Privacy
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
The Ladder: A Reliable Leaderboard for Machine Learning Competitions
...
1
2
3
...