# Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes

@inproceedings{Ober2020GlobalIP, title={Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes}, author={Sebastian W. Ober and Laurence Aitchison}, booktitle={International Conference on Machine Learning}, year={2020} }

We derive the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global" inducing points, which are defined…

## 35 Citations

### Variational Laplace for Bayesian neural networks

- Computer ScienceArXiv
- 2021

We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic…

### Gradient Regularization as Approximate Variational Inference

- Computer ScienceEntropy
- 2021

Variational Laplace for Bayesian neural networks (BNNs), which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights, gave better test performance and expected calibration errors than maximum a posteriori inference and standard sampling-based variational inference.

### The Promises and Pitfalls of Deep Kernel Learning

- Computer ScienceUAI
- 2021

It is shown that the overﬁtting from overparameterized deep kernel learning, in which the model is “some-what Bayesian”, can in certain scenarios be worse than that from not being Bayesian at all, and that a fully Bayesian treatment of deepkernel learning can rectify this over-tting and obtain the desired performance improvements.

### A statistical theory of cold posteriors in deep neural networks

- Computer ScienceICLR
- 2021

AGenerative model describing curation is developed which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

### Sparse Uncertainty Representation in Deep Learning with Inducing Weights

- Computer ScienceNeurIPS
- 2021

This work augments each weight matrix with a small inducing weight matrix, projecting the uncertainty quantification into a lower dimensional space, and extends Matheron’s conditional Gaussian sampling rule to enable fast weight sampling, which enables the inference method to maintain reasonable run-time as compared with ensembles.

### Priors in Bayesian Deep Learning: A Review

- Computer ScienceInternational Statistical Review
- 2022

An overview of different priors that have been proposed for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks is presented and different methods of learning priors for these models from data are outlined.

### Deep kernel processes

- Computer ScienceICML
- 2021

A tractable deep kernel process, the deep inverse Wishart process, is defined, and a doubly-stochastic inducing-point variational inference scheme is given that operates on the Gram matrices, not on the features, as in DGPs.

### The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

- Computer ScienceNeurIPS
- 2021

This paper analyzes two existing classes of models: Deep GP and neural networks, focusing on how width affects performance metrics and offering useful guidance for DeepGP and neural network architectures.

### Finite Versus Infinite Neural Networks: an Empirical Study

- Computer ScienceNeurIPS
- 2020

Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.

## References

SHOWING 1-10 OF 79 REFERENCES

### Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

- Computer ScienceICML
- 2016

A variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices is introduced and "pseudo-data" (Snelson & Ghahramani, 2005) is incorporated in this model, which allows for more efficient posterior sampling while maintaining the properties of the original model.

### Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

- Computer ScienceNeurIPS
- 2020

The results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.

### Deep Neural Networks as Gaussian Processes

- Computer ScienceICLR
- 2018

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

### A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference

- Computer ScienceArXiv
- 2019

This paper predicts how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically shows how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases.

### Noisy Natural Gradient as Variational Inference

- Computer ScienceICML
- 2018

It is shown that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO), which allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets.

### On the Expressiveness of Approximate Inference in Bayesian Neural Networks

- Computer ScienceNeurIPS
- 2020

It is found empirically that pathologies of a similar form as in the single-hidden layer case can persist when performing variational inference in deeper networks, and a universality result is proved showing that there exist approximate posteriors in the above classes which provide flexible uncertainty estimates.

### Deep Gaussian Processes

- Computer ScienceAISTATS
- 2013

Deep Gaussian process (GP) models are introduced and model selection by the variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.

### Sparse Orthogonal Variational Inference for Gaussian Processes

- Computer ScienceAISTATS
- 2020

A new interpretation of sparse variational approximations for Gaussian processes using inducing points is introduced, which can lead to more scalable algorithms than previous methods and report state-of-the-art results on CIFAR-10 among purely GP-based models.

### Compositional uncertainty in deep Gaussian processes

- Computer ScienceUAI
- 2020

It is argued that such an inference scheme is suboptimal, not taking advantage of the potential of the model to discover the compositional structure in the data, and examines alternative variational inference schemes allowing for dependencies across different layers.

### How Good is the Bayes Posterior in Deep Neural Networks Really?

- Computer ScienceICML
- 2020

This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.