• Corpus ID: 208637294

# Deep Ensembles: A Loss Landscape Perspective

@article{Fort2019DeepEA,
title={Deep Ensembles: A Loss Landscape Perspective},
author={Stanislav Fort and Huiyi Hu and Balaji Lakshminarayanan},
journal={ArXiv},
year={2019},
volume={abs/1912.02757}
}
• Published 25 September 2019
• Computer Science
• ArXiv
Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the…
234 Citations

## Figures from this paper

RBUE: A ReLU-Based Uncertainty Estimation Method of Deep Neural Networks
• Computer Science
• 2021
This work introduces a ReLU-Based Uncertainty Estimation (RBUE) method, which adds randomness to the activation function module of a DNN to estimate uncertainty and demonstrates that the method has competitive performance but is more favorable in training time and memory requirements.
Interpolating Compressed Parameter Subspaces
• Computer Science, Mathematics
• 2022
The utility of CPS is demonstrated for single and multiple test-time distribution settings, with improved mappings between the two spaces with higher accuracy, improved robustness performance across perturbation types, reduced catastrophic forgetting on Split-CIFAR10/100, strong capacity for multi-task solutions and unseen/distant tasks, and storage-efﬁcient inference (ensembling, hypernetworks).
Practical uncertainty quantification for brain tumor segmentation
This work introduces a novel multi-headed Variational U-Net that combines the global exploration capabilities of deep ensembles with the out-of-distribution robustness of Variational Inference and ensures superior uncertainty quantification within a reasonable compute requirement.
Multiple Importance Sampling ELBO and Deep Ensembles of Variational Approximations
• Computer Science
AISTATS
• 2022
This work proposes the multiple importance sampling ELBO (MISELBO), a versatile yet simple framework that allows to unveil connections between VI and recent advances in the importance sampling literature, paving the way for further methodological advances.
Improving robustness and calibration in ensembles with diversity regularization
• Computer Science, Environmental Science
ArXiv
• 2022
This work systematically evaluates the viability of explicitly regularizing ensemble diversity to improve calibration on in-distribution data as well as under dataset shift and demonstrates that diversity regularization is highly beneficial in architectures, where weights are partially shared between the individual members.
Diversity and Generalization in Neural Network Ensembles
• Computer Science, Environmental Science
AISTATS
• 2022
This work combines and expands previously published results in a theoretically sound framework that describes the relationship between diversity and ensemble performance for a wide range of ensemble methods and empirically validate this theoretical analysis with neural network ensembles.
On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications
• Computer Science
ArXiv
• 2021
This work building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework diversify sampled subnetworks, leverage dropout patterns, and use a branching technique to improve predictive performance while maintaining fast computations.
Automated Cleanup of the ImageNet Dataset by Model Consensus, Explainability and Confident Learning
The convolutional neural networks (CNNs) trained on ILSVRC12 ImageNet were the backbone of various applications as a generic classifier, a feature extractor or a base model for transfer learning.
Accurate and Reliable Forecasting using Stochastic Differential Equations
• Computer Science
ArXiv
• 2021
SDE-HNN is a new heteroscedastic neural network equipped with stochastic differential equations (SDE) to characterize the interaction between the predictive mean and variance of HNNs for accurate and reliable regression, and significantly outperforms the state-of-the-art baselines in terms of both predictive performance and uncertainty quantification.
LiBRe: A Practical Bayesian Approach to Adversarial Detection
• Computer Science
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2021
This work builds the few-layer deep ensemble variational and adopt the pre-training & fine-tuning workflow to boost the effectiveness and efficiency of LiBRe, and provides a novel insight to realise adversarial detection-oriented uncertainty quantification without inefficiently crafting adversarial examples during training.

## References

SHOWING 1-10 OF 40 REFERENCES
Identity Mappings in Deep Residual Networks
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Deep Residual Learning for Image Recognition
• Kaiming He, Jian Sun
• Computer Science
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Large Scale Structure of Neural Network Loss Landscapes
• Computer Science
NeurIPS
• 2019
This work proposes and experimentally verify a unified phenomenological model of the loss landscape as a set of high dimensional wedges that together form a large-scale, inter-connected structure and towards which optimization is drawn.
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
• Computer Science
NeurIPS
• 2019
A large-scale benchmark of existing state-of-the-art methods on classification problems and the effect of dataset shift on accuracy and calibration is presented, finding that traditional post-hoc calibration does indeed fall short, as do several other previous methods.
Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision
• Computer Science
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
• 2020
This work proposes a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning and applies this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the- art scalable methods: ensembling and MC-dropout.
Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
• Computer Science
ICLR
• 2020
This work develops Cyclical Stochastic Gradient MCMC (SG-MCMC), a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode, and proves non-asymptotic convergence of the proposed algorithm.
A Simple Baseline for Bayesian Uncertainty in Deep Learning
• Computer Science
NeurIPS
• 2019
It is demonstrated that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
The Goldilocks zone: Towards better understanding of neural network loss landscapes
• Computer Science
AAAI
• 2019
It is demonstrated that initializing a neural network at a number of points and selecting for high measures of local convexity such as $\mathrm{Tr}(H) / ||H||$, number of positive eigenvalues of $H$, or low initial loss, leads to statistically significantly faster training on MNIST.
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
• Computer Science
ICLR
• 2019
This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.