# What Are Bayesian Neural Network Posteriors Really Like?

@inproceedings{Izmailov2021WhatAB, title={What Are Bayesian Neural Network Posteriors Really Like?}, author={Pavel Izmailov and Sharad Vikram and Matthew D. Hoffman and Andrew Gordon Wilson}, booktitle={ICML}, year={2021} }

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as meanfield variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant…

## Figures and Tables from this paper

## 91 Citations

Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

- Computer Science
- 2022

It is experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself, and it is shown that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.

Wide Mean-Field Bayesian Neural Networks Ignore the Data

- Computer ScienceAISTATS
- 2022

This work shows that mean-ﬁeld variational inference entirely fails to model the data when the network width is large and the activation function is odd, and shows that the optimal approximate posterior need not tend to the prior if theactivation function is not odd.

Collapsed Variational Bounds for Bayesian Neural Networks

- Computer Science
- 2021

The new bounds significantly improve the performance of Gaussian mean-field VI applied to BNNs on a variety of data sets, and are found that the tighter ELBOs can be good optimization targets for learning the hyperparameters of hierarchical priors.

Bayesian Deep Learning via Subnetwork Inference

- Computer ScienceICML
- 2021

This work shows that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors, and proposes a subnetwork selection strategy that aims to maximally preserve the model’s predictive uncertainty.

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

- Computer ScienceICML
- 2022

A Markov chain Monte Carlo posterior sampling algorithm is developed which mixes faster the wider the BNN, and up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks.

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

- Computer ScienceNeurIPS
- 2020

It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

Informative Bayesian Neural Network Priors for Weak Signals

- Computer ScienceBayesian Analysis
- 2021

A new joint prior is proposed over the local feature-speciﬁc scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the model’s proportion of variance explained matches the prior distribution.

Natural Posterior Network: Deep Bayesian Uncertainty for Exponential Family Distributions

- Computer Science
- 2021

The Natural Posterior Network is proposed for fast and highquality uncertainty estimation for any task where the target distribution belongs to the exponential family, and it leverages Normalizing Flows to fit a single density on a learned low-dimensional and taskdependent latent space.

Meta-Learning to Perform Bayesian Inference in a single Forward Propagation

- Computer Science
- 2021

It is demonstrated that PFNs can near-perfectly mimic Gaussian processes and also enable efﬁcient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods.

Dangers of Bayesian Model Averaging under Covariate Shift

- Computer ScienceNeurIPS
- 2021

It is shown how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction.

## References

SHOWING 1-10 OF 83 REFERENCES

Exact posterior distributions of wide Bayesian neural networks

- Computer ScienceArXiv
- 2020

This work provides the missing theoretical proof that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior and shows how to generate exact samples from a finite BNN on a small dataset via rejection sampling.

On the Expressiveness of Approximate Inference in Bayesian Neural Networks

- Computer ScienceNeurIPS
- 2020

It is found empirically that pathologies of a similar form as in the single-hidden layer case can persist when performing variational inference in deeper networks, and a universality result is proved showing that there exist approximate posteriors in the above classes which provide flexible uncertainty estimates.

Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference

- Computer ScienceArXiv
- 2020

This paper develops a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a sub network, and proposes a subnetwork selection procedure which aims to optimally preserve posterior uncertainty.

How Good is the Bayes Posterior in Deep Neural Networks Really?

- Computer ScienceICML
- 2020

This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

- Computer ScienceICML
- 2020

A rank-1 parameterization of BNNs is proposed, where each weight matrix involves only a distribution on aRank-1 subspace, and the use of mixture approximate posteriors to capture multiple modes is revisited.

Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

- Computer ScienceNeurIPS
- 2020

The results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

- Computer ScienceNeurIPS
- 2020

It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

Deterministic Variational Inference for Robust Bayesian Neural Networks

- Computer ScienceICLR
- 2019

This work introduces a novel deterministic method to approximate moments in neural networks, eliminating gradient variance and introduces a hierarchical prior for parameters and a novel Empirical Bayes procedure for automatically selecting prior variances, and demonstrates good predictive performance over alternative approaches.

Noisy Natural Gradient as Variational Inference

- Computer ScienceICML
- 2018

It is shown that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO), which allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets.

A statistical theory of cold posteriors in deep neural networks

- Computer ScienceICLR
- 2021

AGenerative model describing curation is developed which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.