# Generalized Dropout

@article{Srinivas2016GeneralizedD, title={Generalized Dropout}, author={Suraj Srinivas and R. Venkatesh Babu}, journal={ArXiv}, year={2016}, volume={abs/1611.06791} }

Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable…

## 27 Citations

### Variational Dropout Sparsifies Deep Neural Networks

- Computer ScienceICML
- 2017

Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.

### Survey of Dropout Methods for Deep Neural Networks

- Computer ScienceArXiv
- 2019

The history of dropout methods, their various applications, and current areas of research interest are summarized.

### Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout

- Computer ScienceArXiv
- 2018

Adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features.

### CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction

- Computer ScienceArXiv
- 2021

This work proposes Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning, and designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process.

### Principal Component Networks: Parameter Reduction Early in Training

- Computer ScienceArXiv
- 2020

This paper shows how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs and uses PCA to find a basis of high variance for layer inputs and represent layer weights using these directions.

### Simple and Effective Stochastic Neural Networks

- Computer ScienceAAAI
- 2021

This paper proposes a simple and effective stochastic neural network architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability, which produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration.

### Stochastic batch size for adaptive regularization in deep network optimization

- Computer SciencePattern Recognit.
- 2022

### Deep networks with probabilistic gates

- Computer ScienceArXiv
- 2018

This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach.

### Learning Compact Architectures for Deep Neural Networks

- Computer Science
- 2018

A method is described that takes a pre-trained network model and performs compression without using training data and is called ‘Architecture-Learning’, which applies the Architecture-Learning methodology to sparsify neural networks, i.e.; remove weights to create sparse weight matrices.

### Robust Learning of Parsimonious Deep Neural Networks

- Computer ScienceArXiv
- 2022

The simulations show that the proposed simultaneous learning and pruning algorithm achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.

## References

SHOWING 1-10 OF 20 REFERENCES

### Adaptive dropout for training deep neural networks

- Computer ScienceNIPS
- 2013

A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.

### Learning the Architecture of Deep Neural Networks

- Computer ScienceArXiv
- 2015

This work introduces the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights, and introduces a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons.

### Dropout: a simple way to prevent neural networks from overfitting

- Computer ScienceJ. Mach. Learn. Res.
- 2014

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

### Variational Dropout and the Local Reparameterization Trick

- Computer ScienceNIPS
- 2015

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

### Regularization of Neural Networks using DropConnect

- Computer ScienceICML
- 2013

This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

### Practical Variational Inference for Neural Networks

- Computer ScienceNIPS
- 2011

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

### Predicting Parameters in Deep Learning

- Computer ScienceNIPS
- 2013

It is demonstrated that there is significant redundancy in the parameterization of several deep learning models and not only can the parameter values be predicted, but many of them need not be learned at all.

### Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

- Computer ScienceArXiv
- 2013

This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network.

### Weight Uncertainty in Neural Networks

- Computer ScienceArXiv
- 2015

This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

### Scalable Bayesian Optimization Using Deep Neural Networks

- Computer ScienceICML
- 2015

This work shows that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically, which allows for a previously intractable degree of parallelism.