Dropout Training as Adaptive Regularization
@inproceedings{Wager2013DropoutTA, title={Dropout Training as Adaptive Regularization}, author={S. Wager and Sida I. Wang and Percy Liang}, booktitle={NIPS}, year={2013} }
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close… CONTINUE READING
Supplemental Video
Paper Mentions
Blog Post
407 Citations
Dropout training for SVMs with data augmentation
- Computer Science
- Frontiers of Computer Science
- 2018
- 4
- Highly Influenced
- PDF
Curriculum Dropout
- Psychology, Computer Science
- 2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
- 32
- Highly Influenced
- PDF
The Implicit and Explicit Regularization Effects of Dropout
- Computer Science, Mathematics
- ICML
- 2020
- 14
- Highly Influenced
- PDF
On the inductive bias of dropout
- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015
- 42
- Highly Influenced
- PDF
Dropout Training, Data-dependent Regularization, and Generalization Bounds
- Mathematics, Computer Science
- ICML
- 2018
- 12
- Highly Influenced
- PDF
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
- Computer Science, Mathematics
- NIPS
- 2017
- 77
- PDF
References
SHOWING 1-10 OF 35 REFERENCES
Training with Noise is Equivalent to Tikhonov Regularization
- Mathematics, Computer Science
- Neural Computation
- 1995
- 806
- Highly Influential
- PDF
Adding noise to the input of a model trained with a regularized objective
- Computer Science, Mathematics
- ArXiv
- 2011
- 58
- PDF
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2011
- 6,437
- PDF
Improving neural networks by preventing co-adaptation of feature detectors
- Computer Science
- ArXiv
- 2012
- 5,094
- PDF