• Corpus ID: 238856644

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

@inproceedings{Liu2022DeepEW,
  title={Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity},
  author={S. Liu and Tianlong Chen and Zahra Atashgahi and Xiaohan Chen and Ghada Sokar and Elena Mocanu and Mykola Pechenizkiy and Zhangyang Wang and Decebal Constantin Mocanu},
  booktitle={ICLR},
  year={2022}
}
The success of deep ensembles on improving predictive performance, uncertainty estimation, and out-of-distribution robustness has been extensively studied in the machine learning literature. Albeit the promising results, naively training multiple deep neural networks and combining their predictions at inference leads to prohibitive computational costs and memory requirements. Recently proposed efficient ensemble approaches reach the performance of the traditional deep ensembles with… 

Sequential Bayesian Neural Subnetwork Ensembles

TLDR
This work empirically demonstrates that the proposed sequential ensembling of dynamic Bayesian neural subnetworks that systematically reduce model complexity through sparsity-inducing priors and generate diverse ensembles in a single forward pass surpasses the baselines of the dense frequentist and Bayesian ensemble models in prediction accuracy, uncertainty estimation, and out-of-distribution robustness.

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

TLDR
It is argued that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork.

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

TLDR
This work observes that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs, and proposes an alternative way to perform an “ensemble” over the subnets identi fied by iterative magnitude pruning via a simple interpolating strategy.

Dynamic Sparse Training for Deep Reinforcement Learning

TLDR
This paper introduces for the first time a dynamic sparse training approach for deep reinforcement learning to accelerate the training process and trains a sparse neural network from scratch and dynamically adapts its topology to the changing data distribution during training.

Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better

TLDR
A novel FL framework by which complex neural networks can be deployed and trained with substantially improved efficiency in both on-device computation and in-network communication, and dynamic sparsity naturally introduces an "in-time self-ensembling effect'' into the training dynamics, and improves the FL performance even over dense training.

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

TLDR
A new CL method, named AFAF, that aims to Avoid Forgetting and Allow Forward transfer in class-IL using fix-capacity models is proposed, which allocates a sub-network that enables selective transfer of relevant knowledge to a new task while preserving past knowledge.

Exploring Lottery Ticket Hypothesis in Spiking Neural Networks

TLDR
The proposed Early-Time (ET) ticket can be seamlessly combined with the a common pruning techniques for finding winning tickets, such as Iterative Magnitude Pruning (IMP) and Early-Bird (EB) tickets, and results show that the proposed ET ticket reduces search time by up to 38% compared to IMP or EB methods.

Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

TLDR
It is empirically demonstrated that winning-ticket subnetworks produced more diverse predictions than dense networks and their ensemble outperformed the standard ensemble in some tasks when accurate lottery tickets are found on the tasks.

Addressing the Stability-Plasticity Dilemma via Knowledge-Aware Continual Learning

TLDR
KAN is a new task-specific components method based on dynamic sparse training that introduces reusability and selective transfer of past knowledge in this class of methods, and selectively reuses relevant knowledge while addressing class ambiguity, preserves old knowledge, and utilizes the model capacity efficiently.

CARD: Classification and Regression Diffusion Models

TLDR
Experimental results show that CARD in general outperforms state-of-the-art methods, including Bayesian neural network-based ones that are designed for uncertainty estimation, especially when the conditional distribution of y given x is multi-modal.

References

SHOWING 1-10 OF 97 REFERENCES

Snapshot Ensembles: Train 1, get M for free

TLDR
This paper proposes a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost by training a single neural network, converging to several local minima along its optimization path and saving the model parameters.

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

TLDR
It is demonstrated that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembled.

Top-KAST: Top-K Always Sparse Training

TLDR
This work proposes Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes), and demonstrates the efficacy of this approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity.

Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset

TLDR
It is shown that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets.

Deep Ensembles: A Loss Landscape Perspective

TLDR
Developing the concept of the diversity--accuracy plane, it is shown that the decorrelation power of random initializations is unmatched by popular subspace sampling methods and the experimental results validate the hypothesis that deep ensembles work well under dataset shift.

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

TLDR
This work describes approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice, and defines a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks.

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

TLDR
BatchEnsemble is proposed, an ensemble method whose computational and memory costs are significantly lower than typical ensembles and can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

The State of Sparsity in Deep Neural Networks

TLDR
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

TLDR
This work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network, and outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget.

Rigging the Lottery: Making All Tickets Winners

TLDR
This paper introduces a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods.
...