• Corpus ID: 211069074

Calibrate and Prune: Improving Reliability of Lottery Tickets Through Prediction Calibration

@article{Venkatesh2020CalibrateAP,
  title={Calibrate and Prune: Improving Reliability of Lottery Tickets Through Prediction Calibration},
  author={Bindya Venkatesh and Jayaraman J. Thiagarajan and Kowshik Thopalli and Prasanna Sattigeri},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.03875}
}
The hypothesis that sub-network initializations (lottery) exist within the initializations of over-parameterized networks, which when trained in isolation produce highly generalizable models, has led to crucial insights into network initialization and has enabled efficient inferencing. Supervised models with uncalibrated confidences tend to be overconfident even when making wrong prediction. In this paper, for the first time, we study how explicit confidence calibration in the over… 

Figures from this paper

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization
TLDR
This work introduces Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance and encourages the masks to learn a balance of domain-invariant and domain-specific features, thus enabling a model which can benefit from the predictive power of specialized features while retaining the universal applicability of Domain-Invariant features.
A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness
TLDR
This work is able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness—simply by pruning and (optionally) quantizing.
Physarum Powered Differentiable Linear Programming Layers and Applications
TLDR
This work proposes an efficient and differentiable solver for general linear programming problems which can be used in a plug and play manner within deep neural networks as a layer and can easily serve as layers whenever a learning procedure needs a fast approximate solution to a LP, within a larger network.
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
TLDR
It is shown that sparse NNs have poor gradient flow at initialization and a modified initialization for unstructured connectivity is proposed, and it is found that DST methods significantly improve gradient flow during training over traditional sparse training methods.
NAPS: Non-adversarial polynomial synthesis
Pattern Recognition Applications and Methods: 9th International Conference, ICPRAM 2020, Valletta, Malta, February 22–24, 2020, Revised Selected Papers
TLDR
The aim here is to design a traffic sign recognition framework which can be used for multiple countries and be able to classify even the hard to distinguish classes by exploiting category hierarchy of traffic signs.

References

SHOWING 1-10 OF 40 REFERENCES
Evaluating Lottery Tickets Under Distributional Shifts
TLDR
The experiments show that sparse subnetworks obtained through lottery ticket training do not simply overfit to particular domains, but rather reflect an inductive bias of deep neural networks that can be exploited in multiple domains.
Sparse Transfer Learning via Winning Lottery Tickets
TLDR
It is shown that sparse sub-networks with approximately 90-95% of weights removed achieve (and often exceed) the accuracy of the original dense network in several realistic settings.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
TLDR
This paper studies the three critical components of the Lottery Ticket algorithm, showing that each may be varied significantly without impacting the overall results, and shows why setting weights to zero is important, how signs are all you need to make the reinitialized network train, and why masks behaves like training.
One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers
TLDR
It is found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset.
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
On Calibration of Modern Neural Networks
TLDR
It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.
Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
TLDR
This work shows that a naive pseudo-labeling overfits to incorrect pseudo-labels due to the so-called confirmation bias and demonstrates that mixup augmentation and setting a minimum number of labeled samples per mini-batch are effective regularization techniques for reducing it.
Measuring Calibration in Deep Learning
TLDR
A comprehensive empirical study of choices in calibration measures including measuring all probabilities rather than just the maximum prediction, thresholding probability values, class conditionality, number of bins, bins that are adaptive to the datapoint density, and the norm used to compare accuracies to confidences.
Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences
TLDR
A novel variance-weighted confidence-integrated loss function that is composed of two cross-entropy loss terms with respect to ground-truth and uniform distribution, which are balanced by variance of stochastic prediction scores is designed.
...
1
2
3
4
...