Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

@article{Kristiadi2022PosteriorRI,
  title={Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks},
  author={Agustinus Kristiadi and Runa Eschenhagen and Philipp Hennig},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.10041}
}
Monte Carlo (MC) integration is the de facto method for approximating the predictive distribution of Bayesian neural networks (BNNs). But, even with many MC samples, Gaussian-based BNNs could still yield bad predictive performance due to the posterior approximation’s error. Meanwhile, alternatives to MC integration tend to be more expensive or biased. In this work, we experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior… 

References

SHOWING 1-10 OF 56 REFERENCES
What Are Bayesian Neural Network Posteriors Really Like?
TLDR
It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.
Functional Variational Bayesian Neural Networks
TLDR
Functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund defined directly on stochastic processes, are introduced and it is proved that the KL divergence between stoChastic processes equals the supremum of marginal KL divergences over all finite sets of inputs.
Bayesian Deep Learning via Subnetwork Inference
TLDR
This work shows that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors, and proposes a subnetwork selection strategy that aims to maximally preserve the model’s predictive uncertainty.
Refining the variational posterior through iterative optimization
TLDR
This work proposes a method for training highly flexible variational distributions by starting with a coarse approximation and iteratively refining it, which consistently outperforms recent variational inference methods for deep learning in terms of log-likelihood and the ELBO.
Improving predictions of Bayesian neural networks via local linearization
In this paper we argue that in Bayesian deep learning, the frequently utilized generalized Gauss-Newton (GGN) approximation should be understood as a modification of the underlying probabilistic
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
TLDR
This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.
Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning
TLDR
This work proposes to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks and can be used post hoc with any set of pre-trained networks and only requires a small computational and memory overhead compared to regular ensembles.
Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
We reinterpret multiplicative noise in neural networks as auxiliary random variables that augment the approximate posterior in a variational setting for Bayesian neural networks. We show that through
Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
TLDR
A rank-1 parameterization of BNNs is proposed, where each weight matrix involves only a distribution on aRank-1 subspace, and the use of mixture approximate posteriors to capture multiple modes is revisited.
Subspace Inference for Bayesian Deep Learning
TLDR
Low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, are constructed, which contain diverse sets of high performing models and show that Bayesian model averaging over the induced posterior produces accurate predictions and well calibrated predictive uncertainty for both regression and image classification.
...
...