# Metrics of calibration for probabilistic predictions

@article{Ibarra2022MetricsOC, title={Metrics of calibration for probabilistic predictions}, author={Imanol Arrieta Ibarra and Paman Gujral and Jonathan Tannen and Mark Tygert and Cherie Xu}, journal={ArXiv}, year={2022}, volume={abs/2205.09680} }

Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, “reliability diagrams” (also known as “calibration plots”) help detect and diagnose statistically signiﬁcant discrepancies — so-called “miscalibration” — between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values…

## 2 Citations

### A Unifying Theory of Distance from Calibration

- Computer ScienceArXiv
- 2022

Fundamental lower and upper bounds on measuring distance to calibration are established, and theoretical justi cation for preferring certain metrics (like Laplace kernel calibration) in practice is provided in practice.

## References

SHOWING 1-10 OF 19 REFERENCES

### Measuring Calibration in Deep Learning

- Computer ScienceCVPR Workshops
- 2019

A comprehensive empirical study of choices in calibration measures including measuring all probabilities rather than just the maximum prediction, thresholding probability values, class conditionality, number of bins, bins that are adaptive to the datapoint density, and the norm used to compare accuracies to confidences.

### T-Cal: An optimal test for the calibration of predictive models

- Computer ScienceArXiv
- 2022

This work considers detecting mis-calibration of predictive models using a ﬁnite validation dataset as a hypothesis testing problem, and proposes T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the (cid:96) 2 -Expected Calibration Error (ECE).

### Some Remarks on the Reliability of Categorical Probability Forecasts

- Environmental Science
- 2008

Studies on forecast evaluation often rely on estimating limiting observed frequencies conditioned on specific forecast probabilities (the reliability diagram or calibration function). Obviously,…

### Verified Uncertainty Calibration

- Computer ScienceNeurIPS
- 2019

The scaling-binning calibrator is introduced, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration, and estimates a model's calibration error more accurately using an estimator from the meteorological community.

### Mitigating bias in calibration error estimation

- Computer ScienceAISTATS
- 2022

A simple alternative calibration error metric, ECE_sweep, in which the number of bins is chosen to be as large as possible while preserving monotonicity in the calibration function is proposed, which produces a less biased estimator of calibration error.

### Calibration of Neural Networks using Splines

- Computer ScienceICLR
- 2021

This work introduces a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions.

### A graphical method of cumulative differences between two subpopulations

- Computer ScienceJ. Big Data
- 2021

C cumulative methods for the common case in which no Score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation are developed.

### Cumulative deviation of a subpopulation from the full population

- Environmental ScienceJ. Big Data
- 2021

C cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning and encode subpopulation deviation directly as the slopes of secant lines for the graphs.

### Deep Residual Learning for Image Recognition

- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.