Corpus ID: 220128117

Supermasks in Superposition

@article{Wortsman2020SupermasksIS,
  title={Supermasks in Superposition},
  author={Mitchell Wortsman and Vivek Ramanujan and Rosanne Liu and Aniruddha Kembhavi and Mohammad Rastegari and J. Yosinski and Ali Farhadi},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.14769}
}
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear… Expand
Linear Mode Connectivity in Multitask and Continual Learning
TLDR
It is empirically found that different minima of the same task are typically connected by very simple curves of low error, and this finding is exploited to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. Expand
Efficient Feature Transformations for Discriminative and Generative Continual Learning
TLDR
This work proposes a simple task-specific feature map transformation strategy for continual learning, which it calls Efficient Feature Transformations (EFTs), which provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture. Expand
Emerging Paradigms of Neural Network Pruning
TLDR
A general pruning framework is proposed so that the emerging pruning paradigms can be accommodated well with the traditional one and the open questions as worthy future directions are summarized. Expand
Class-incremental Learning with Pre-allocated Fixed Classifiers
TLDR
This work substitutes the expanding classifier with a novel fixed classifier in which a number of pre-allocated output nodes are subject to the classification loss right from the beginning of the learning phase. Expand
In the Wild: From ML Models to Pragmatic ML Systems
TLDR
A unified learning & evaluation framework - iN thE wilD (NED) is introduced, designed to be a more general paradigm by loosening the restrictive design decisions of past settings & imposing fewer restrictions on learning algorithms. Expand
Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis
TLDR
This paper proposes bowtie networks that jointly learn 3D geometric and semantic representations with feedback in the loop and instantiates it on the illustrative dual-task of joint few-shot recognition and novel-view synthesis. Expand
Efficient Estimation of Influence of a Training Instance
TLDR
The proposed method, inspired by dropout, can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization. Expand
Communication-Efficient and Personalized Federated Lottery Ticket Learning
TLDR
A personalized and communication-efficient federated lottery ticket learning algorithm, coined CELL, which exploits downlink broadcast for communication efficiency and utilizes a novel user grouping method, thereby alternating between FL and lottery learning to mitigate stragglers. Expand
Continual Learning in Task-Oriented Dialogue Systems
TLDR
This paper proposes a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end, and implements and compares multiple existing continual learning baselines. Expand
Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions
TLDR
DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation by considering both efficacy and efficiency, and is much more efficient -- twice as fast as previous best two-branch approaches. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 64 REFERENCES
Continual learning with hypernetworks
TLDR
Insight is provided into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and it is shown that task-conditioned hypernetworks demonstrate transfer learning. Expand
Overcoming catastrophic forgetting in neural networks
TLDR
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks. Expand
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
TLDR
BatchEnsemble is proposed, an ensemble method whose computational and memory costs are significantly lower than typical ensembles and can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks. Expand
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
  • Arun Mallya, S. Lazebnik
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This paper is able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Expand
The Lottery Ticket Hypothesis: Training Pruned Neural Networks
TLDR
The lottery ticket hypothesis and its connection to pruning are a step toward developing architectures, initializations, and training strategies that make it possible to solve the same problems with much smaller networks. Expand
An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks
TLDR
It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. Expand
Continual Learning via Neural Pruning
TLDR
Continual Learning via Neural Pruning is introduced, a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification, and the concept of graceful forgetting is formalized and incorporated. Expand
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
TLDR
This work learns binary masks that “piggyback” on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task, and shows performance comparable to dedicated fine-tuned networks for a variety of classification tasks. Expand
Three scenarios for continual learning
TLDR
Three continual learning scenarios are described based on whether at test time task identity is provided and--in case it is not--whether it must be inferred, and it is found that regularization-based approaches fail and that replaying representations of previous experiences seems required for solving this scenario. Expand
Experience Replay for Continual Learning
TLDR
This work shows that using experience replay buffers for all past events with a mixture of on- and off-policy learning can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities. Expand
...
1
2
3
4
5
...