# Bayesian Deep Learning and a Probabilistic Perspective of Generalization

@article{Wilson2020BayesianDL, title={Bayesian Deep Learning and a Probabilistic Perspective of Generalization}, author={Andrew Gordon Wilson and Pavel Izmailov}, journal={ArXiv}, year={2020}, volume={abs/2002.08791} }

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the…

## Figures from this paper

## 265 Citations

### Wide Mean-Field Bayesian Neural Networks Ignore the Data

- Computer ScienceAISTATS
- 2022

This work shows that mean-ﬁeld variational inference entirely fails to model the data when the network width is large and the activation function is odd, and shows that the optimal approximate posterior need not tend to the prior if theactivation function is not odd.

### Bayesian Model Selection, the Marginal Likelihood, and Generalization

- Computer ScienceICML
- 2022

It is shown how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning.

### Greedy Bayesian Posterior Approximation with Deep Ensembles

- Computer ScienceArXiv
- 2021

A novel and principled method to tackle the problem of greedy ensemble construction by minimizing an f -divergence between the true posterior and a kernel density estimator (KDE) in a function space.

### Evaluating Approximate Inference in Bayesian Deep Learning

- Computer ScienceNeurIPS
- 2021

This competition evaluates the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices.

### Deep Ensemble as a Gaussian Process Approximate Posterior

- Computer ScienceArXiv
- 2022

This work relates DE to Bayesian inference to enjoy reliable Bayesian uncertainty and provides strategies to make the training efﬁcient, which consumes only marginally added training cost than the standard DE, but achieves better uncertainty quanti ﬁcation than DE and its variants across diverse scenarios.

### Bayesian Deep Learning via Subnetwork Inference

- Computer ScienceICML
- 2021

This work shows that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors, and proposes a subnetwork selection strategy that aims to maximally preserve the model’s predictive uncertainty.

### Laplace Redux - Effortless Bayesian Deep Learning

- Computer ScienceNeurIPS
- 2021

This work reviews the range of variants of the Laplace approximation, an easy-to-use software library for PyTorch offering user-friendly access to all major versions of the LA, and demonstrates that the LA is competitive with more popular alternatives in terms of performance, while excelling in Terms of computational cost.

### What Are Bayesian Neural Network Posteriors Really Like?

- Computer ScienceICML
- 2021

It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.

### Contrasting random and learned features in deep Bayesian linear regression

- Computer SciencePhysical review. E
- 2022

Comparing deep random feature models to deep networks in which all layers are trained provides a detailed characterization of the interplay between width, depth, data density, and prior mismatch and begins to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

### Structured Dropout Variational Inference for Bayesian Neural Networks

- Computer ScienceNeurIPS
- 2021

This work focuses on the inflexibility of the factorized structure in Dropout posterior and proposes an improved method called Variational Structured Dropout (VSD), which employs an orthogonal transformation to learn a structured representation on the variational noise and consequently induces statistical dependencies in the approximate posterior.

## References

SHOWING 1-10 OF 103 REFERENCES

### The Case for Bayesian Deep Learning

- Computer ScienceArXiv
- 2020

The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.

### Deep Ensembles: A Loss Landscape Perspective

- Computer ScienceArXiv
- 2019

Developing the concept of the diversity--accuracy plane, it is shown that the decorrelation power of random initializations is unmatched by popular subspace sampling methods and the experimental results validate the hypothesis that deep ensembles work well under dataset shift.

### What Are Bayesian Neural Network Posteriors Really Like?

- Computer ScienceICML
- 2021

It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.

### Uncertainty in Neural Networks: Approximately Bayesian Ensembling

- Computer ScienceAISTATS
- 2020

This work proposes one modification to the usual process of ensembling NNs which it is argued does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior.

### A Bayesian Perspective on Generalization and Stochastic Gradient Descent

- Computer ScienceICLR
- 2018

It is proposed that the noise introduced by small mini-batches drives the parameters towards minima whose evidence is large, and it is demonstrated that, when one holds the learning rate fixed, there is an optimum batch size which maximizes the test set accuracy.

### On Priors for Bayesian Neural Networks

- Computer Science
- 2018

This dissertation aims to help the reader navigate the landscape of neural network priors, surveys the existing work on priors for neural networks, isolating some key themes such as the move towards heavy-tailed priors and describes how to give Bayesian neural networks an adaptive width by placing stick-breaking priors on their latent representation.

### How Good is the Bayes Posterior in Deep Neural Networks Really?

- Computer ScienceICML
- 2020

This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

### A Simple Baseline for Bayesian Uncertainty in Deep Learning

- Computer ScienceNeurIPS
- 2019

It is demonstrated that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.

### Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

- Computer ScienceNIPS
- 2017

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

### Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

- Computer ScienceICML
- 2016

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.