• Corpus ID: 11675927

Generalized Dropout

  title={Generalized Dropout},
  author={Suraj Srinivas and R. Venkatesh Babu},
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable… 

Figures and Tables from this paper

Variational Dropout Sparsifies Deep Neural Networks

Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.

Survey of Dropout Methods for Deep Neural Networks

The history of dropout methods, their various applications, and current areas of research interest are summarized.

Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout

Adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features.

CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction

This work proposes Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning, and designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process.

Principal Component Networks: Parameter Reduction Early in Training

This paper shows how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs and uses PCA to find a basis of high variance for layer inputs and represent layer weights using these directions.

Simple and Effective Stochastic Neural Networks

This paper proposes a simple and effective stochastic neural network architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability, which produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration.

Deep networks with probabilistic gates

This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach.

Learning Compact Architectures for Deep Neural Networks

A method is described that takes a pre-trained network model and performs compression without using training data and is called ‘Architecture-Learning’, which applies the Architecture-Learning methodology to sparsify neural networks, i.e.; remove weights to create sparse weight matrices.

Robust Learning of Parsimonious Deep Neural Networks

The simulations show that the proposed simultaneous learning and pruning algorithm achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.



Adaptive dropout for training deep neural networks

A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.

Learning the Architecture of Deep Neural Networks

This work introduces the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights, and introduces a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons.

Dropout: a simple way to prevent neural networks from overfitting

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Variational Dropout and the Local Reparameterization Trick

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

Regularization of Neural Networks using DropConnect

This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

Practical Variational Inference for Neural Networks

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

Predicting Parameters in Deep Learning

It is demonstrated that there is significant redundancy in the parameterization of several deep learning models and not only can the parameter values be predicted, but many of them need not be learned at all.

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network.

Weight Uncertainty in Neural Networks

This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

Scalable Bayesian Optimization Using Deep Neural Networks

This work shows that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically, which allows for a previously intractable degree of parallelism.