On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

@article{Kapoor2022OnUT,
  title={On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification},
  author={Sanyam Kapoor and Wesley J. Maddox and Pavel Izmailov and Andrew Gordon Wilson},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.16481}
}
Aleatoric uncertainty captures the inherent randomness of the data, such as measurement noise. In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter. By contrast, for Bayesian classification we use a categorical distribution with no mechanism to represent our beliefs about aleatoric uncertainty. Our work shows that explicitly accounting for aleatoric uncertainty significantly improves the performance… 

Theoretical characterization of uncertainty in high-dimensional linear classification

This manuscript characterise uncertainty for learning from limited number of samples of high-dimensional Gaussian input data and labels generated by the probit model, and provides a closed-form formula for the joint statistics between the logistic classifier, the uncertainty of the statistically optimal Bayesian classi-er and the ground-truth probit uncertainty.

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

DAP calibration is introduced, a method to correct overconfidence of Bayesian deep learning models outside of the training domain that is agnostic to the posterior inference method, and it can be performed as a post-processing step.

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations

References

SHOWING 1-10 OF 60 REFERENCES

Cold Posteriors and Aleatoric Uncertainty

It is argued that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets.

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

A Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty is presented, which makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.

Bayesian Inference for Large Scale Image Classification

ATMC, an adaptive noise MCMC algorithm that estimates and is able to sample from the posterior of a neural network, is introduced and is shown to be intrinsically robust to overfitting on the training data and to provide a better calibrated measure of uncertainty compared to the optimization baseline.

Dangers of Bayesian Model Averaging under Covariate Shift

It is shown how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction.

Data augmentation in Bayesian neural networks and the cold posterior effect

It is suggested that the cold posterior effect cannot be dismissed as an artifact of data augmentation using incorrect likelihoods, and multi-sample bounds tighter than those used previously are derived.

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

A rank-1 parameterization of BNNs is proposed, where each weight matrix involves only a distribution on aRank-1 subspace, and the use of mixture approximate posteriors to capture multiple modes is revisited.

A statistical theory of cold posteriors in deep neural networks

AGenerative model describing curation is developed which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

Laplace Redux - Effortless Bayesian Deep Learning

This work reviews the range of variants of the Laplace approximation, an easy-to-use software library for PyTorch offering user-friendly access to all major versions of the LA, and demonstrates that the LA is competitive with more popular alternatives in terms of performance, while excelling in Terms of computational cost.

How Good is the Bayes Posterior in Deep Neural Networks Really?

This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
...