• Corpus ID: 239998136

Diversity Matters When Learning From Ensembles

@article{Nam2021DiversityMW,
  title={Diversity Matters When Learning From Ensembles},
  author={Gi Cheon Nam and Jongmin Yoon and Yoonho Lee and Juho Lee},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.14149}
}
Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration. Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability. While some recent works propose to distill an ensemble model into a single model to reduce such costs, there is still a performance gap between the ensemble and distilled models. We propose a simple approach for reducing this gap, i.e., making the distilled performance… 

References

SHOWING 1-10 OF 26 REFERENCES
Ensemble Distribution Distillation
TLDR
A solution for EnD$^2, a class of models which allow a single neural network to explicitly model a distribution over output distributions, is proposed in this work, which enables a single model to retain both the improved classification performance of ensemble distillation as well as information about the diversity of the ensemble, which is useful for uncertainty estimation.
Knowledge Distillation Thrives on Data Augmentation
TLDR
This paper shows that KD loss can benefit from extended training iterations while the cross-entropy loss does not, and proposes to enhance KD via a stronger data augmentation scheme (e.g., mixup, CutMix), which may inspire more advanced KD algorithms.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
TLDR
BatchEnsemble is proposed, an ensemble method whose computational and memory costs are significantly lower than typical ensembles and can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.
Distilling Ensembles Improves Uncertainty Estimates
We seek to bridge the performance gap between batch ensembles (ensembles of deep networks with shared parameters) and deep ensembles on tasks which require not only predictions, but also uncertainty
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
TLDR
It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
TLDR
This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Training independent subnetworks for robust prediction
TLDR
This work shows that, using a multi-input multi-output (MIMO) configuration, one can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand, and improves model robustness without increasing compute.
Wide Residual Networks
TLDR
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
...
1
2
3
...