• Corpus ID: 211258943

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

@inproceedings{Kristiadi2020BeingBE,
  title={Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks},
  author={Agustinus Kristiadi and Matthias Hein and Philipp Hennig},
  booktitle={International Conference on Machine Learning},
  year={2020}
}
The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian… 

Learnable Uncertainty under Laplace Approximations

Uncertainty units for Laplace-approximated networks are introduced: Hidden units with zero weights that can be added to any pre-trained, point-estimated network, making the Laplace approximation competitive with more expensive alternative uncertainty-quantification frameworks.

RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness

This work shows that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss, and improves the quality of the predictive uncertainty estimation of Mixup in most cases.

Model Architecture Adaption for Bayesian Neural Networks

A novel network architecture search (NAS) that optimizes BNNs for both accuracy and uncertainty while having a reduced inference latency and using only a fraction of the runtime compared to many popular BNN baselines is shown.

Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning

This work proposes to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks and can be used post hoc with any set of pre-trained networks and only requires a small computational and memory overhead compared to regular ensembles.

An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence

The resulting model is asymptotically maximally uncertain far away from the data while the BNNs’ predictive power is unaffected near the data and can be applied post-hoc to any pre-trained ReLU BNN at a low cost.

Fixing Asymptotic Uncertainty of Bayesian Neural Networks with Infinite ReLU Features

It is proved that the added uncertainty of an infinite number of ReLU features over the input domain yields cubic predictive variance growth, and thus the ideal uniform confidence in multi-class classification far from the training data.

Do Bayesian Neural Networks Need To Be Fully Stochastic?

It is proved that expressive predictive distributions require only small amounts of stochasticity, and partially stoChastic networks with only n stochastics biases are universal probabilistic predictors for n -dimensional predictive problems.

Accelerated Linearized Laplace Approximation for Bayesian Deep Learning

A Nyström approximation to NTKs to accelerate LLA is developed, inspired by the connections between LLA and neural tangent kernels (NTKs), and enjoys reassuring theoretical guarantees.

Concept Embeddings for Fuzzy Logic Verification of Deep Neural Networks in Perception Tasks

This work presents a simple, yet effective, approach to verify whether a trained convolutional neural network (CNN) respects specified symbolic background knowledge, and shows that this approach benefits from fuzziness and calibrating the concept outputs.
...

References

SHOWING 1-10 OF 56 REFERENCES

Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem

A new robust optimization technique similar to adversarial training is proposed which enforces low confidence predictions far away from the training data while maintaining high confidence predictions and test error on the original classification task compared to standard training.

A Scalable Laplace Approximation for Neural Networks

This work uses recent insights from second-order optimisation for neural networks to construct a Kronecker factored Laplace approximation to the posterior over the weights of a trained network, enabling practitioners to estimate the uncertainty of models currently used in production without having to retrain them.

On Calibration of Modern Neural Networks

It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

Stochastic Variational Deep Kernel Learning

An efficient form of stochastic variational inference is derived which leverages local kernel interpolation, inducing points, and structure exploiting algebra within this framework to enable classification, multi-task learning, additive covariance structures, and Stochastic gradient training.

Deep Kernel Learning

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs

A Practical Bayesian Framework for Backpropagation Networks

  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models.

Scalable Bayesian Optimization Using Deep Neural Networks

This work shows that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically, which allows for a previously intractable degree of parallelism.

The Evidence Framework Applied to Classification Networks

  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
It is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems and an information-based data selection criterion is derived and demonstrated within this framework.

Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation

Uncertainty quantification is an important research area in machine learning. Many approaches have been developed to improve the representation of uncertainty in deep models to avoid overconfident

On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation

The experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and it is observed that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.
...