Corpus ID: 11675927

Generalized Dropout

@article{Srinivas2016GeneralizedD,
  title={Generalized Dropout},
  author={Suraj Srinivas and R. Venkatesh Babu},
  journal={ArXiv},
  year={2016},
  volume={abs/1611.06791}
}
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable… Expand
Variational Dropout Sparsifies Deep Neural Networks
TLDR
Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported. Expand
Survey of Dropout Methods for Deep Neural Networks
TLDR
The history of dropout methods, their various applications, and current areas of research interest are summarized. Expand
Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout
TLDR
Adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features. Expand
CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction
TLDR
This work proposes Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning, and designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process. Expand
Principal Component Networks: Parameter Reduction Early in Training
TLDR
This paper shows how to find small networks that exhibit the same performance as their overparameterized counterparts after only a few training epochs and uses PCA to find a basis of high variance for layer inputs and represent layer weights using these directions. Expand
Simple and Effective Stochastic Neural Networks
TLDR
This paper proposes a simple and effective stochastic neural network architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability, which produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration. Expand
Stochastic batch size for adaptive regularization in deep network optimization
TLDR
The quantitative evaluation indicates that the first-order stochastic optimization algorithm incorporating adaptive regularization outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the Selection of regularity. Expand
Deep networks with probabilistic gates
TLDR
This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach. Expand
Learning Compact Architectures for Deep Neural Networks
Deep Neural Networks (NNs) have recently emerged as the model of choice for a wide a range of Machine Learning applications ranging from computer vision to speech recognition to natural languageExpand
Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label
TLDR
A first-order optimization method (Label-Noised Trim-SGD) which combines the label noise with the example trimming in order to remove the outliers and obtain a better regularization effect than the original methods is proposed. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 21 REFERENCES
Adaptive dropout for training deep neural networks
TLDR
A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines. Expand
Learning the Architecture of Deep Neural Networks
TLDR
This work introduces the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights, and introduces a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons. Expand
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Variational Dropout and the Local Reparameterization Trick
TLDR
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes. Expand
Regularization of Neural Networks using DropConnect
TLDR
This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect. Expand
Practical Variational Inference for Neural Networks
  • A. Graves
  • Computer Science, Mathematics
  • NIPS
  • 2011
TLDR
This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective. Expand
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
TLDR
This work presents an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches, and approximate the model's intractable posterior with Bernoulli variational distributions. Expand
Predicting Parameters in Deep Learning
TLDR
It is demonstrated that there is significant redundancy in the parameterization of several deep learning models and not only can the parameter values be predicted, but many of them need not be learned at all. Expand
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
TLDR
This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. Expand
Weight Uncertainty in Neural Networks
TLDR
This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems. Expand
...
1
2
3
...