Corpus ID: 3861760

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

@article{Wen2018FlipoutEP,
  title={Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches},
  author={Yeming Wen and Paul Vicol and Jimmy Ba and Dustin Tran and Roger B. Grosse},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.04386}
}
Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling… Expand
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
TLDR
The empirical studies with standard deep learning model-architectures and datasets shows that the proposed add covariance noise to the gradients method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated. Expand
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
TLDR
BatchEnsemble is proposed, an ensemble method whose computational and memory costs are significantly lower than typical ensembles and can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks. Expand
Efficient Low Rank Gaussian Variational Inference for Neural Networks
TLDR
It is found that adding low- rank terms to parametrized diagonal covariance does not improve predictive performance except on small networks, but low-rank terms added to a constant diagonal covariances improves performance on small and large-scale network architectures. Expand
Learning Sparse Networks Using Targeted Dropout
TLDR
Target dropout is introduced, a method for training a neural network so that it is robust to subsequent pruning, and improves upon more complicated sparsifying regularisers while being simple to implement and easy to tune. Expand
HWA: Hyperparameters Weight Averaging in Bayesian Neural Networks
Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty of the predicted output in the Bayesian framework. InExpand
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
TLDR
This paper proposes an efficient method, AutoLRS, which automatically optimizes the learning rate (LR) schedule for each training stage by modeling training dynamics, and demonstrates the advantages and the generality of this method through extensive experiments of training DNNs for tasks from diverse domains using different optimizers. Expand
Simple, Distributed, and Accelerated Probabilistic Programming
TLDR
A simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem, which achieves an optimal linear speedup from 1 to 256 TPUv2 chips. Expand
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
TLDR
This work aims to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which mapshyperparameters to optimal weights and biases, and outperforms competing hyperparameter optimization methods on large-scale deep learning problems. Expand
Laplace Redux - Effortless Bayesian Deep Learning
TLDR
This work reviews the range of variants of the Laplace approximation, including versions with minimal cost overhead; introduces laplace, an easy-to-use software library for PyTorch offering user-friendly access to all major flavors of the LA; and demonstrates that the LA is competitive with more popular alternatives in terms of performance, while excelling in computational cost. Expand
BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning
TLDR
This work develops a new framework, named BayesAdapter, to adapt pre-trained deterministic NNs to be BNNs via Bayesian fine-tuning with a plug-and-play instantiation of stochastic variational inference, and proposes exemplar reparameterization to reduce gradient variance and stabilize the fine- Tuning. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Variational Dropout and the Local Reparameterization Trick
TLDR
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Neural Variational Inference and Learning in Belief Networks
TLDR
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset. Expand
Regularizing and Optimizing LSTM Language Models
TLDR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Expand
Training Recurrent Networks by Evolino
TLDR
It is shown that Evolino-based LSTM can solve tasks that Echo State nets cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-basedLSTM. Expand
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
Reducing Reparameterization Gradient Variance
TLDR
This work views the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample, making it a useful control variate for variance reduction. Expand
Bayesian Compression for Deep Learning
TLDR
This work argues that the most principled and effective way to attack the problem of compression and computational efficiency in deep learning is by adopting a Bayesian point of view, where through sparsity inducing priors the authors prune large parts of the network. Expand
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
TLDR
This work applies a new variational inference based dropout technique in LSTM and GRU models, which outperforms existing techniques, and to the best of the knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank. Expand
...
1
2
3
4
...