• Corpus ID: 8645175

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

  title={Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks},
  author={Jos{\'e} Miguel Hern{\'a}ndez-Lobato and Ryan P. Adams},
Large multilayer neural networks trained with backpropagation have recently achieved state-of-the-art results in a wide range of problems. [] Key Method Similar to classical backpropagation, PBP works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients. A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show…

Figures and Tables from this paper

Bayesian Recurrent Neural Networks
This work shows that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, and demonstrates how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs.
Analytically Tractable Inference in Neural Networks – An Alternative to Backpropagation
TAGI enables unprecedented features such as the propagation of uncertainty from the input of a network up to its output, and it allows inferring the value of hidden states, inputs, as well as latent variables.
Natural-Parameter Networks: A Class of Probabilistic Neural Networks
A class of probabilistic neural networks, dubbed natural-parameter networks (NPN), is proposed as a novel and lightweight Bayesian treatment of NN, which allows the usage of arbitrary exponential-family distributions to model the weights and neurons.
Practical Considerations for Probabilistic Backpropagation
Probabilistic Backpropagation (PBP) was developed to address the scalability issue of Bayesian neural networks, and facilitates tractable Bayesian learning of networks with large structures and large
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
It is shown that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models, and it is demonstrated how this finding allows us to make useful estimates of the model uncertainty.
Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation
A novel and efficient extension of probabilistic backpropagation, a state-of-the-art method for training Bayesian neural networks, that can be used to train DGPs and is able to automatically discover useful input warping, expansion or compression, and it is therefore is a flexible form of Bayesian kernel design.
This work introduces a novel deterministic method to approximate moments in neural networks, eliminating gradient variance and introduces a hierarchical prior for parameters and a novel Empirical Bayes procedure for automatically selecting prior variances, and demonstrates good predictive performance over alternative approaches.
Uncertainty Estimation of Deep Neural Networks
The ensemble Kalman filter, two proposed training schemes for training both fully-connected and Long Short-term Memory (LSTM) networks, and experiment are described.
Kalman Bayesian Neural Networks for Closed-form Online Learning
This paper proposes a novel approach for BNN learning via closed-form Bayesian inference, whereby the calculation of the predictive distribution of the output and the update of the weight distribution are treated as Bayesian filtering and smoothing problems, where the weights are modeled as Gaussian random variables.


Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights
It is shown how an EP based approach can also be used to train deterministic MNNs, and an analytical approximation to the Bayes update of this posterior is found, as well as the resulting Bayes estimates of the weights and outputs.
A Practical Bayesian Framework for Backpropagation Networks
  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models.
Dropout: a simple way to prevent neural networks from overfitting
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.
Expectation propagation for neural networks with sparsity-promoting priors
A novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights is proposed and a factorized posterior approximation is derived.
Practical Variational Inference for Neural Networks
This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.
Practical Bayesian Optimization of Machine Learning Algorithms
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Keeping the neural networks simple by minimizing the description length of the weights
A method of computing the derivatives of the expected squared error and of the amount of information in the noisy weights in a network that contains a layer of non-linear hidden units without time-consuming Monte Carlo simulations is described.
Bayesian Methods for Adaptive Models
The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non–linear models and it is shown that the careful incorporation of error bar information into a classifier’s predictions yields improved performance.
Large-Scale Machine Learning with Stochastic Gradient Descent
A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.