Corpus ID: 26983440

On Modern Deep Learning and Variational Inference

  title={On Modern Deep Learning and Variational Inference},
  author={Y. Gal and Zoubin Ghahramani},
Bayesian modelling and variational inference are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. It is perhaps astonishing then that most modern deep learning models can be cast as performing approximate variational inference in a Bayesian setting. This… Expand

Figures and Tables from this paper

MARIO HUMBERTO BECERRA CONTRERAS A comparison of frequentist methods and Bayesian approximations in the implementation of Convolutional Neural Networks in an Active Learning setting
In this work, an approximate Bayesian approach for Deep Learning is compared with a conventional approach, all within a context of Active Learning. The conventional approach is based on theExpand
ShapeOdds: Variational Bayesian Learning of Generative Shape Models
  • S. Elhabian, R. Whitaker
  • Mathematics, Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
A variational approach for learning a latent variable model in which it is demonstrated that the proposed model generates realistic samples, generalizes to unseen examples, and is able to handle missing regions and/or background clutter, while comparing favorably with recent, neural-network-based approaches. Expand
Variational Bayesian Parameter Estimation Techniques for the General Linear Model
The conceptual and formal underpinnings of VB, VML, ReML, and ML are revisited and a detailed account of their mathematical relationships and implementational details are provided. Expand
Explainable Artificial Intelligence via Bayesian Teaching
Modern machine learning methods are increasingly powerful and opaque. This opaqueness is a concern across a variety of domains in which algorithms are making important decisions that should beExpand
Addressing model uncertainty in probabilistic forecasting using Monte Carlo dropout
In recent years, deep learning models have been developed to address probabilistic forecasting tasks, assuming an implicit stochastic process that relates past observed values to uncertain futureExpand
Image classification with a MSF dropout
A multi-scale fusion (MSF) dropout method on the basis of standard dropout is proposed, which shows that the prediction accuracy is significantly improved compared with the other two kinds of dropout, which verifies the effectiveness of the multi- scale fusion method. Expand
Dropout distillation
This work introduces a novel approach, coined "dropout distillation", that allows to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. Expand
Reliable Deep Grade Prediction with Uncertainty Estimation
Two types of Bayesian deep learning models for grade prediction under a course-specific framework are presented, based on the assumption that prior courses can provide students with knowledge for future courses so that grades of prior course can be used to predict grades in a future course. Expand
Master Thesis: Data-Driven Machine Learning of Turbulence Models
In this project the student will implement her/his own version of a learning machine, leveraging the underlying laws of physics to then extract high-dimensional data patterns from Computational FluidExpand
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
By lifting the ReLU function into a higher dimensional space, this work develops a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs) and proves that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. Expand


Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. Expand
Dropout as a Bayesian Approximation : Insights and Applications
Deep learning techniques are used more and more often, but they lack the ability to reason about uncertainty over the features. Features extracted from a dataset are given as point estimates, and doExpand
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
This work presents an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches, and approximate the model's intractable posterior with Bernoulli variational distributions. Expand
Ensemble learning in Bayesian neural networks
Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte CarloExpand
Practical Variational Inference for Neural Networks
  • A. Graves
  • Computer Science, Mathematics
  • NIPS
  • 2011
This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective. Expand
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Dropout: a simple way to prevent neural networks from overfitting
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Weight Uncertainty in Neural Networks
This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems. Expand
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand