Corpus ID: 11675927

# Generalized Dropout

@article{Srinivas2016GeneralizedD,
title={Generalized Dropout},
author={Suraj Srinivas and R. Venkatesh Babu},
journal={ArXiv},
year={2016},
volume={abs/1611.06791}
}
• Published 21 November 2016
• Computer Science
• ArXiv
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable… Expand
22 Citations

#### Figures, Tables, and Topics from this paper

Variational Dropout Sparsifies Deep Neural Networks
• Computer Science, Mathematics
• ICML
• 2017
Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported. Expand
Survey of Dropout Methods for Deep Neural Networks
• Computer Science
• ArXiv
• 2019
The history of dropout methods, their various applications, and current areas of research interest are summarized. Expand
Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout
• Computer Science, Mathematics
• ArXiv
• 2018
Adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features. Expand
CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction
This work proposes Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning, and designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process. Expand
Principal Component Networks: Parameter Reduction Early in Training
• Computer Science, Mathematics
• ArXiv
• 2020
This paper shows how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs and uses PCA to find a basis of high variance for layer inputs and represent layer weights using these directions. Expand
Simple and Effective Stochastic Neural Networks
• Computer Science
• AAAI
• 2021
This paper proposes a simple and effective stochastic neural network architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability, which produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration. Expand
Stochastic batch size for adaptive regularization in deep network optimization
• Computer Science, Mathematics
• ArXiv
• 2020
The quantitative evaluation indicates that the first-order stochastic optimization algorithm incorporating adaptive regularization outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the Selection of regularity. Expand
Deep networks with probabilistic gates
• Computer Science
• ArXiv
• 2018
This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach. Expand
Learning Compact Architectures for Deep Neural Networks
Deep Neural Networks (NNs) have recently emerged as the model of choice for a wide a range of Machine Learning applications ranging from computer vision to speech recognition to natural languageExpand
Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label
• Computer Science
• ArXiv
• 2020
A first-order optimization method (Label-Noised Trim-SGD) which combines the label noise with the example trimming in order to remove the outliers and obtain a better regularization effect than the original methods is proposed. Expand

#### References

SHOWING 1-10 OF 21 REFERENCES
Adaptive dropout for training deep neural networks
• Computer Science
• NIPS
• 2013
A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines. Expand
Learning the Architecture of Deep Neural Networks
• Computer Science
• ArXiv
• 2015
This work introduces the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights, and introduces a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons. Expand
Dropout: a simple way to prevent neural networks from overfitting
• Computer Science
• J. Mach. Learn. Res.
• 2014
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Variational Dropout and the Local Reparameterization Trick
• Computer Science, Mathematics
• NIPS
• 2015
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes. Expand
Regularization of Neural Networks using DropConnect
• Mathematics, Computer Science
• ICML
• 2013
This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect. Expand
Practical Variational Inference for Neural Networks
• A. Graves
• Computer Science, Mathematics
• NIPS
• 2011
This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective. Expand
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
• Computer Science, Mathematics
• ArXiv
• 2015
This work presents an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches, and approximate the model's intractable posterior with Bernoulli variational distributions. Expand
Predicting Parameters in Deep Learning
• Computer Science, Mathematics
• NIPS
• 2013
It is demonstrated that there is significant redundancy in the parameterization of several deep learning models and not only can the parameter values be predicted, but many of them need not be learned at all. Expand
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
• Computer Science, Mathematics
• ArXiv
• 2013
This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. Expand
Weight Uncertainty in Neural Networks
• Mathematics, Computer Science
• ArXiv
• 2015
This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems. Expand