# The Implicit and Explicit Regularization Effects of Dropout

@inproceedings{Wei2020TheIA, title={The Implicit and Explicit Regularization Effects of Dropout}, author={Colin Wei and S. Kakade and Tengyu Ma}, booktitle={ICML}, year={2020} }

Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous… Expand

#### Figures, Tables, and Topics from this paper

#### 32 Citations

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

- Computer Science, Mathematics
- COLT
- 2021

It is shown that in an over-parameterized setting,SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Expand

Explicit Regularisation in Gaussian Noise Injections

- Mathematics, Computer Science
- NeurIPS
- 2020

It is shown analytically and empirically that such regularisation produces calibrated classifiers with large classification margins and that the explicit regulariser derived is able to reproduce these effects. Expand

Understanding the Role of Training Regimes in Continual Learning

- Computer Science, Mathematics
- NeurIPS
- 2020

This work hypothesizes that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting, and studies the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks'Local minima and consequently, on helping it not to forget catastrophically. Expand

Dropout as an Implicit Gating Mechanism For Continual Learning

- Computer Science, Mathematics
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2020

It is shown that a stable network with dropout learns a gating mechanism such that for different tasks, different paths of the network are active, and the stability achieved by this implicit gating plays a very critical role in leading to performance comparable to or better than other involved continual learning algorithms to overcome catastrophic forgetting. Expand

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

- Computer Science, Mathematics
- ICML
- 2021

This paper develops a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise and formally proves that GNIs induce an ‘implicit bias’, which varies depending on the heaviness of the tails and the level of asymmetry. Expand

On Mixup Regularization

- Computer Science, Mathematics
- ArXiv
- 2020

It is shown that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data, and that these transformations and perturbations induce multiple known regularization schemes that interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. Expand

Deep learning regularization techniques to genomics data

- Computer Science
- 2021

Some architectures of Deep Learning algorithms are described, optimization process for training them are explained and a theoretical relationship between L 2 -regularization and Dropout is established. Expand

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

- Computer Science
- ArXiv
- 2021

It is found that drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training, despite reducing the number of unique training examples in each mini-batch. Expand

Dropout Training is Distributionally Robust Optimal

- 2021

This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician’s covariates using a… Expand

Dropout Training is Distributionally Robust Optimal Dropout Training is Distributionally Robust Optimal

- 2021

This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician’s covariates using a… Expand

#### References

SHOWING 1-10 OF 73 REFERENCES

Dropout with Expectation-linear Regularization

- Computer Science, Mathematics
- ICLR
- 2017

This work first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further theoretical analysis, and introduces (approximate) expectation-linear dropout neural networks, whose inference gap the authors are able to formally characterize. Expand

Understanding Dropout

- Computer Science, Mathematics
- NIPS
- 2013

A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. Expand

Surprising properties of dropout in deep networks

- Computer Science, Mathematics
- COLT
- 2017

This work uncovers new properties of dropout, extends the understanding of why dropout succeeds, and lays the foundation for further progress on how dropout is insensitive to various re-scalings of the input features, outputs, and network weights. Expand

Dropout Training as Adaptive Regularization

- Computer Science, Mathematics
- NIPS
- 2013

By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. Expand

Dropout: Explicit Forms and Capacity Control

- Computer Science, Mathematics
- ICML
- 2021

This work shows that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. Expand

On Dropout and Nuclear Norm Regularization

- Computer Science, Mathematics
- ICML
- 2019

A formal and complete characterization of the explicit regularizer induced by dropout in deep linear networks with squared loss is given and the global optima of the dropout objective is characterized. Expand

Fast dropout training

- Computer Science
- ICML
- 2013

This work shows how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective, which gives an order of magnitude speedup and more stability. Expand

On the Implicit Bias of Dropout

- Computer Science, Mathematics
- ICML
- 2018

This paper shows that dropout tends to make the norm of incoming/outgoing weight vectors of all the hidden nodes equal in single hidden-layer linear neural networks, and provides a complete characterization of the optimization landscape induced by dropout. Expand

On the inductive bias of dropout

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015

This paper continues the exploration of dropout as a regularizer pioneered by Wager, et.al. Expand

Altitude Training: Strong Bounds for Single-Layer Dropout

- Computer Science, Mathematics
- NIPS
- 2014

It is shown that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization and should therefore induce minimal bias in high dimensions. Expand