• Corpus ID: 11675927

Generalized Dropout

@article{Srinivas2016GeneralizedD,
title={Generalized Dropout},
author={Suraj Srinivas and R. Venkatesh Babu},
journal={ArXiv},
year={2016},
volume={abs/1611.06791}
}
• Published 21 November 2016
• Computer Science
• ArXiv
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable…
27 Citations

Figures and Tables from this paper

Variational Dropout Sparsifies Deep Neural Networks

• Computer Science
ICML
• 2017
Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.

Survey of Dropout Methods for Deep Neural Networks

• Computer Science
ArXiv
• 2019
The history of dropout methods, their various applications, and current areas of research interest are summarized.

Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout

• Computer Science
ArXiv
• 2018
Adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features.

CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction

This work proposes Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning, and designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process.

Principal Component Networks: Parameter Reduction Early in Training

• Computer Science
ArXiv
• 2020
This paper shows how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs and uses PCA to find a basis of high variance for layer inputs and represent layer weights using these directions.

Simple and Effective Stochastic Neural Networks

• Computer Science
AAAI
• 2021
This paper proposes a simple and effective stochastic neural network architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability, which produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration.

Deep networks with probabilistic gates

• Computer Science
ArXiv
• 2018
This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach.

Learning Compact Architectures for Deep Neural Networks

A method is described that takes a pre-trained network model and performs compression without using training data and is called ‘Architecture-Learning’, which applies the Architecture-Learning methodology to sparsify neural networks, i.e.; remove weights to create sparse weight matrices.

Robust Learning of Parsimonious Deep Neural Networks

• Computer Science
ArXiv
• 2022
The simulations show that the proposed simultaneous learning and pruning algorithm achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.

References

SHOWING 1-10 OF 20 REFERENCES

Adaptive dropout for training deep neural networks

• Computer Science
NIPS
• 2013
A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.

Learning the Architecture of Deep Neural Networks

• Computer Science
ArXiv
• 2015
This work introduces the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights, and introduces a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons.

Dropout: a simple way to prevent neural networks from overfitting

• Computer Science
J. Mach. Learn. Res.
• 2014
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Variational Dropout and the Local Reparameterization Trick

• Computer Science
NIPS
• 2015
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

Regularization of Neural Networks using DropConnect

• Computer Science
ICML
• 2013
This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

Practical Variational Inference for Neural Networks

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

Predicting Parameters in Deep Learning

• Computer Science
NIPS
• 2013
It is demonstrated that there is significant redundancy in the parameterization of several deep learning models and not only can the parameter values be predicted, but many of them need not be learned at all.

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

• Computer Science
ArXiv
• 2013
This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network.

Weight Uncertainty in Neural Networks

• Computer Science
ArXiv
• 2015
This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

Scalable Bayesian Optimization Using Deep Neural Networks

• Computer Science
ICML
• 2015
This work shows that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically, which allows for a previously intractable degree of parallelism.