• Corpus ID: 160705

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

  title={Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning},
  author={Yarin Gal and Zoubin Ghahramani},
Deep learning tools have gained tremendous attention in applied machine learning. [] Key Result We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

Figures and Tables from this paper

Dropout as a Bayesian Approximation : Insights and Applications
It is shown that a multilayer perceptron (MLP) with arbitrary depth and non-linearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model.
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
It is shown that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models, and it is demonstrated how this finding allows us to make useful estimates of the model uncertainty.
Variational Inference to Measure Model Uncertainty in Deep Neural Networks
A novel approach for training deep neural networks in a Bayesian way that uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior and can be used to calculate credible intervals for the prediction and to optimize the network architecture for a given training data set.
Improving Bayesian Inference in Deep Neural Networks with Variational Structured Dropout
This work focuses on restrictions of the factorized structure of Dropout posterior which is inflexible to capture rich correlations among weight parameters of the true posterior, and proposes a novel method called Variational Structured Dropout (VSD) to overcome this limitation.
Novel Uncertainty Framework for Deep Learning Ensembles
A novel statistical mechanics based framework to dropout is proposed and this framework is used to propose a new generic algorithm that focuses on estimates of the variance of the loss as measured by the ensemble of thinned networks.
Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference
A novel approach for training deep neural networks in a Bayesian way that allows for quantifying the uncertainty in model parameters while only adding very few additional parameters to be optimized and can be used to calculate credible intervals for the network prediction and to optimize network architecture for the dataset at hand.
Uncertainty Quantification for Sparse Deep Learning
This paper provides semi-parametric Bernstein-von Mises theorems for linear and quadratic functionals, which guarantee that implied Bayesian credible regions have valid frequentist coverage and provides new theoretical justifications for (Bayesian) deep learning with ReLU activation functions.
Learning a Hierarchy of Neural Connections for Modeling Uncertainty
Treating the generative process of unlabeled data as a confounder is suggested, thereby conditioning the prior of the discriminative neural network on the parameters of theGenerative process, and this approach is ultimately translated to a compact hierarchy of sub-networks—a new deep architecture.
Bayesian Evidential Deep Learning with PAC Regularization.
A novel method for closed-form predictive distribution modeling with neural nets that combines a vacuous PAC bound that comprises the marginal likelihood of the predictor and a complexity penalty and improves model fit and uncertainty quantification.
Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks
This work proposes the Bayesian Deep Noise Neural Network (B-DeepNoise), which generalizes standard Bayesian DNNs by extending the random noise variable from the output layer to all hidden layers, and applies it to predict general intelligence from neuroimaging features in the Adolescent Brain Cognitive Development project.


Ensemble learning in Bayesian neural networks
This chapter shows how the ensemble learning approach can be extended to full-covariance Gaussian distributions while remaining computationally tractable, and extends the framework to deal with hyperparameters, leading to a simple re-estimation procedure.
Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks
This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and
Practical Bayesian Optimization of Machine Learning Algorithms
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Dropout: a simple way to prevent neural networks from overfitting
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
A Practical Bayesian Framework for Backpropagation Networks
  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models.
Deep Gaussian Processes
Deep Gaussian process (GP) models are introduced and model selection by the variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification.