• Corpus ID: 218674252

Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes

  title={Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes},
  author={Sebastian W. Ober and Laurence Aitchison},
  booktitle={International Conference on Machine Learning},
We derive the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global" inducing points, which are defined… 

Figures and Tables from this paper

Variational Laplace for Bayesian neural networks

We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic

Bayesian Neural Network Priors Revisited

study in We while We show on a in

Gradient Regularization as Approximate Variational Inference

Variational Laplace for Bayesian neural networks (BNNs), which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights, gave better test performance and expected calibration errors than maximum a posteriori inference and standard sampling-based variational inference.

The Promises and Pitfalls of Deep Kernel Learning

It is shown that the overfitting from overparameterized deep kernel learning, in which the model is “some-what Bayesian”, can in certain scenarios be worse than that from not being Bayesian at all, and that a fully Bayesian treatment of deepkernel learning can rectify this over-tting and obtain the desired performance improvements.

A statistical theory of cold posteriors in deep neural networks

AGenerative model describing curation is developed which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

Sparse Uncertainty Representation in Deep Learning with Inducing Weights

This work augments each weight matrix with a small inducing weight matrix, projecting the uncertainty quantification into a lower dimensional space, and extends Matheron’s conditional Gaussian sampling rule to enable fast weight sampling, which enables the inference method to maintain reasonable run-time as compared with ensembles.

Priors in Bayesian Deep Learning: A Review

An overview of different priors that have been proposed for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks is presented and different methods of learning priors for these models from data are outlined.

Deep kernel processes

A tractable deep kernel process, the deep inverse Wishart process, is defined, and a doubly-stochastic inducing-point variational inference scheme is given that operates on the Gram matrices, not on the features, as in DGPs.

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

This paper analyzes two existing classes of models: Deep GP and neural networks, focusing on how width affects performance metrics and offering useful guidance for DeepGP and neural network architectures.

Finite Versus Infinite Neural Networks: an Empirical Study

Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.



Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

A variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices is introduced and "pseudo-data" (Snelson & Ghahramani, 2005) is incorporated in this model, which allows for more efficient posterior sampling while maintaining the properties of the original model.

Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

The results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference

This paper predicts how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically shows how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases.

Noisy Natural Gradient as Variational Inference

It is shown that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO), which allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets.

On the Expressiveness of Approximate Inference in Bayesian Neural Networks

It is found empirically that pathologies of a similar form as in the single-hidden layer case can persist when performing variational inference in deeper networks, and a universality result is proved showing that there exist approximate posteriors in the above classes which provide flexible uncertainty estimates.

Deep Gaussian Processes

Deep Gaussian process (GP) models are introduced and model selection by the variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.

Sparse Orthogonal Variational Inference for Gaussian Processes

A new interpretation of sparse variational approximations for Gaussian processes using inducing points is introduced, which can lead to more scalable algorithms than previous methods and report state-of-the-art results on CIFAR-10 among purely GP-based models.

Compositional uncertainty in deep Gaussian processes

It is argued that such an inference scheme is suboptimal, not taking advantage of the potential of the model to discover the compositional structure in the data, and examines alternative variational inference schemes allowing for dependencies across different layers.

How Good is the Bayes Posterior in Deep Neural Networks Really?

This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.