• Corpus ID: 28671436

On Calibration of Modern Neural Networks

  title={On Calibration of Modern Neural Networks},
  author={Chuan Guo and Geoff Pleiss and Yu Sun and Kilian Q. Weinberger},
Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. [] Key Result Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

Figures and Tables from this paper

Revisiting the Calibration of Modern Neural Networks

It is shown that the most recent models, notably those not using convolutions, are among the best calibrated, and that architecture is a major determinant of calibration properties.

Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings

MMCE is presented, a RKHS kernel based measure of calibration that is efficiently trainable alongside the negative likelihood loss without careful hyperparameter tuning, and whose finite sample estimates are consistent and enjoy fast convergence rates.

Non-Parametric Calibration for Classification

A method is proposed that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly and can be applied to any classifier that outputs confidence estimates and is not limited to neural networks.

Calibrated Prediction Intervals for Neural Network Regressors

In experiments using different regression tasks from the audio and computer vision domains, it is found that both the proposed methods are indeed capable of producing calibrated prediction intervals for neural network regressors with any desired confidence level, a finding that is consistent across all datasets and neural network architectures.

Confidence Calibration for Convolutional Neural Networks Using Structured Dropout

This paper uses the SVHN, CIFar-10 and CIFAR-100 datasets to empirically compare model diversity and confidence errors obtained using various dropout techniques, and shows the merit of structured dropout in a Bayesian active learning application.

Calibrated Reliable Regression using Maximum Mean Discrepancy

Experiments show that the proposed calibrated regression method using the maximum mean discrepancy by minimizing the kernel embedding measure can produce well-calibrated and sharp prediction intervals, which outperforms the related state-of-the-art methods.

Learning Confidence for Out-of-Distribution Detection in Neural Networks

This work proposes a method of learning confidence estimates for neural networks that is simple to implement and produces intuitively interpretable outputs, and addresses the problem of calibrating out-of-distribution detectors.

Soft Calibration Objectives for Neural Networks

Overall, experiments across losses and datasets demonstrate that using calibrationsensitive procedures yield better uncertainty estimates under dataset shift than the standard practice of using a cross-entropy loss and post-hoc recalibration methods.

Calibrating Deep Neural Network Classifiers on Out-of-Distribution Datasets

A new post-hoc confidence calibration method, called CCAC (Confidence Calibration with an Auxiliary Class), for DNN classifiers on OOD datasets, where the key novelty is an auxiliary class in the calibration model which separates mis-classified samples from correctly classified ones, thus effectively mitigating the target DNN's being confidently wrong.



Calibrated Structured Prediction

The notion of calibration is extended so as to handle various subtleties pertaining to the structured setting, and a simple recalibration method is provided that trains a binary classifier to predict probabilities of interest.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

Predicting good probabilities with supervised learning

We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted

Obtaining Well Calibrated Probabilities Using Bayesian Binning

A new non-parametric calibration method called Bayesian Binning into Quantiles (BBQ) is presented which addresses key limitations of existing calibration methods and can be readily combined with many existing classification algorithms.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

A simple baseline that utilizes probabilities from softmax distributions is presented, showing the effectiveness of this baseline across all computer vision, natural language processing, and automatic speech recognition, and it is shown the baseline can sometimes be surpassed.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Reliable Confidence Estimation via Online Learning

Techniques that assess a classification algorithm's uncertainty via calibrated probabilities and which are guaranteed to be reliable on out-of-distribution input are proposed and validated on two real-world problems: question answering and medical diagnosis from genomic data.

Transforming Neural-Net Output Levels to Probability Distributions

A method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data is presented and shed new light on and generalize the well-known "softmax" scheme.

Learning Scalable Deep Kernels with Recurrent Structure

The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (L STM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes.