• Corpus ID: 211205110

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

@article{Wilson2020BayesianDL,
  title={Bayesian Deep Learning and a Probabilistic Perspective of Generalization},
  author={Andrew Gordon Wilson and Pavel Izmailov},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.08791}
}
The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the… 

Wide Mean-Field Bayesian Neural Networks Ignore the Data

TLDR
This work shows that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd, and shows that the optimal approximate posterior need not tend to the prior if theactivation function is not odd.

Bayesian Model Selection, the Marginal Likelihood, and Generalization

TLDR
It is shown how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning.

Greedy Bayesian Posterior Approximation with Deep Ensembles

TLDR
A novel and principled method to tackle the problem of greedy ensemble construction by minimizing an f -divergence between the true posterior and a kernel density estimator (KDE) in a function space.

Evaluating Approximate Inference in Bayesian Deep Learning

TLDR
This competition evaluates the fidelity of approximate Bayesian inference procedures in deep learning, using as a reference Hamiltonian Monte Carlo (HMC) samples obtained by parallelizing computations over hundreds of tensor processing unit (TPU) devices.

Deep Ensemble as a Gaussian Process Approximate Posterior

TLDR
This work relates DE to Bayesian inference to enjoy reliable Bayesian uncertainty and provides strategies to make the training efficient, which consumes only marginally added training cost than the standard DE, but achieves better uncertainty quanti fication than DE and its variants across diverse scenarios.

Bayesian Deep Learning via Subnetwork Inference

TLDR
This work shows that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors, and proposes a subnetwork selection strategy that aims to maximally preserve the model’s predictive uncertainty.

Laplace Redux - Effortless Bayesian Deep Learning

TLDR
This work reviews the range of variants of the Laplace approximation, an easy-to-use software library for PyTorch offering user-friendly access to all major versions of the LA, and demonstrates that the LA is competitive with more popular alternatives in terms of performance, while excelling in Terms of computational cost.

What Are Bayesian Neural Network Posteriors Really Like?

TLDR
It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.

Contrasting random and learned features in deep Bayesian linear regression

TLDR
Comparing deep random feature models to deep networks in which all layers are trained provides a detailed characterization of the interplay between width, depth, data density, and prior mismatch and begins to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

Structured Dropout Variational Inference for Bayesian Neural Networks

TLDR
This work focuses on the inflexibility of the factorized structure in Dropout posterior and proposes an improved method called Variational Structured Dropout (VSD), which employs an orthogonal transformation to learn a structured representation on the variational noise and consequently induces statistical dependencies in the approximate posterior.
...

References

SHOWING 1-10 OF 103 REFERENCES

The Case for Bayesian Deep Learning

TLDR
The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.

Deep Ensembles: A Loss Landscape Perspective

TLDR
Developing the concept of the diversity--accuracy plane, it is shown that the decorrelation power of random initializations is unmatched by popular subspace sampling methods and the experimental results validate the hypothesis that deep ensembles work well under dataset shift.

What Are Bayesian Neural Network Posteriors Really Like?

TLDR
It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.

Uncertainty in Neural Networks: Approximately Bayesian Ensembling

TLDR
This work proposes one modification to the usual process of ensembling NNs which it is argued does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior.

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

TLDR
It is proposed that the noise introduced by small mini-batches drives the parameters towards minima whose evidence is large, and it is demonstrated that, when one holds the learning rate fixed, there is an optimum batch size which maximizes the test set accuracy.

On Priors for Bayesian Neural Networks

TLDR
This dissertation aims to help the reader navigate the landscape of neural network priors, surveys the existing work on priors for neural networks, isolating some key themes such as the move towards heavy-tailed priors and describes how to give Bayesian neural networks an adaptive width by placing stick-breaking priors on their latent representation.

How Good is the Bayes Posterior in Deep Neural Networks Really?

TLDR
This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

A Simple Baseline for Bayesian Uncertainty in Deep Learning

TLDR
It is demonstrated that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

TLDR
This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

TLDR
A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.
...