Corpus ID: 211572620

The Implicit and Explicit Regularization Effects of Dropout

@inproceedings{Wei2020TheIA,
  title={The Implicit and Explicit Regularization Effects of Dropout},
  author={Colin Wei and S. Kakade and Tengyu Ma},
  booktitle={ICML},
  year={2020}
}
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous… Expand
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
TLDR
It is shown that in an over-parameterized setting,SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Expand
Explicit Regularisation in Gaussian Noise Injections
TLDR
It is shown analytically and empirically that such regularisation produces calibrated classifiers with large classification margins and that the explicit regulariser derived is able to reproduce these effects. Expand
Understanding the Role of Training Regimes in Continual Learning
TLDR
This work hypothesizes that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting, and studies the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks'Local minima and consequently, on helping it not to forget catastrophically. Expand
Dropout as an Implicit Gating Mechanism For Continual Learning
TLDR
It is shown that a stable network with dropout learns a gating mechanism such that for different tasks, different paths of the network are active, and the stability achieved by this implicit gating plays a very critical role in leading to performance comparable to or better than other involved continual learning algorithms to overcome catastrophic forgetting. Expand
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
TLDR
This paper develops a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise and formally proves that GNIs induce an ‘implicit bias’, which varies depending on the heaviness of the tails and the level of asymmetry. Expand
On Mixup Regularization
TLDR
It is shown that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data, and that these transformations and perturbations induce multiple known regularization schemes that interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. Expand
Deep learning regularization techniques to genomics data
TLDR
Some architectures of Deep Learning algorithms are described, optimization process for training them are explained and a theoretical relationship between L 2 -regularization and Dropout is established. Expand
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
TLDR
It is found that drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training, despite reducing the number of unique training examples in each mini-batch. Expand
Dropout Training is Distributionally Robust Optimal
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician’s covariates using aExpand
Dropout Training is Distributionally Robust Optimal Dropout Training is Distributionally Robust Optimal
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician’s covariates using aExpand
...
1
2
3
4
...

References

SHOWING 1-10 OF 73 REFERENCES
Dropout with Expectation-linear Regularization
TLDR
This work first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further theoretical analysis, and introduces (approximate) expectation-linear dropout neural networks, whose inference gap the authors are able to formally characterize. Expand
Understanding Dropout
TLDR
A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. Expand
Surprising properties of dropout in deep networks
TLDR
This work uncovers new properties of dropout, extends the understanding of why dropout succeeds, and lays the foundation for further progress on how dropout is insensitive to various re-scalings of the input features, outputs, and network weights. Expand
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. Expand
Dropout: Explicit Forms and Capacity Control
TLDR
This work shows that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. Expand
On Dropout and Nuclear Norm Regularization
TLDR
A formal and complete characterization of the explicit regularizer induced by dropout in deep linear networks with squared loss is given and the global optima of the dropout objective is characterized. Expand
Fast dropout training
TLDR
This work shows how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective, which gives an order of magnitude speedup and more stability. Expand
On the Implicit Bias of Dropout
TLDR
This paper shows that dropout tends to make the norm of incoming/outgoing weight vectors of all the hidden nodes equal in single hidden-layer linear neural networks, and provides a complete characterization of the optimization landscape induced by dropout. Expand
On the inductive bias of dropout
TLDR
This paper continues the exploration of dropout as a regularizer pioneered by Wager, et.al. Expand
Altitude Training: Strong Bounds for Single-Layer Dropout
TLDR
It is shown that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization and should therefore induce minimal bias in high dimensions. Expand
...
1
2
3
4
5
...