Corpus ID: 235458200

Meta-Calibration: Meta-Learning of Model Calibration Using Differentiable Expected Calibration Error

  title={Meta-Calibration: Meta-Learning of Model Calibration Using Differentiable Expected Calibration Error},
  author={Ondrej Bohdal and Yongxin Yang and Timothy M. Hospedales},
Calibration of neural networks is a topical problem that is becoming increasingly important for real-world use of neural networks. The problem is especially noticeable when using modern neural networks, for which there is significant difference between the model confidence and the confidence it should have. Various strategies have been successfully proposed, yet there is more space for improvements. We propose a novel approach that introduces a differentiable metric for expected calibration… Expand

Figures and Tables from this paper


On Calibration of Modern Neural Networks
It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions. Expand
Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings
MMCE is presented, a RKHS kernel based measure of calibration that is efficiently trainable alongside the negative likelihood loss without careful hyperparameter tuning, and whose finite sample estimates are consistent and enjoy fast convergence rates. Expand
Measuring Calibration in Deep Learning
A comprehensive empirical study of choices in calibration measures including measuring all probabilities rather than just the maximum prediction, thresholding probability values, class conditionality, number of bins, bins that are adaptive to the datapoint density, and the norm used to compare accuracies to confidences. Expand
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
The approach for tuning regularization hyperparameters is explored and it is found that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. Expand
Optimizing Millions of Hyperparameters by Implicit Differentiation
An algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations is proposed and used to train modern network architectures with millions of weights and millions of hyper-parameters. Expand
MetaReg: Towards Domain Generalization using Meta-Regularization
Experimental validations on computer vision and natural language datasets indicate that the encoding of the notion of domain generalization using a novel regularization function using a Learning to Learn (or) meta-learning framework can learn regularizers that achieve good cross-domain generalization. Expand
Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
A natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification is proposed. Expand
Obtaining Well Calibrated Probabilities Using Bayesian Binning
A new non-parametric calibration method called Bayesian Binning into Quantiles (BBQ) is presented which addresses key limitations of existing calibration methods and can be readily combined with many existing classification algorithms. Expand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
When Does Label Smoothing Help?
It is shown empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search and that if a teacher network is trained with label smoothed, knowledge distillation into a student network is much less effective. Expand