Continual learning with hypernetworks

@article{Oswald2020ContinualLW,
  title={Continual learning with hypernetworks},
  author={Johannes von Oswald and Christian Andreas Henning and Jo{\~a}o Sacramento and Benjamin F. Grewe},
  journal={ArXiv},
  year={2020},
  volume={abs/1906.00695}
}
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned… 

Figures and Tables from this paper

Continual Learning in Recurrent Neural Networks with Hypernetworks
TLDR
This work demonstrates that high working memory requirements, but not necessarily sequence length, lead to an increased need for stability at the cost of decreased performance on subsequent tasks, and employs a recent method based on hypernetworks to address catastrophic forgetting on sequential data.
Lifelong Learning Without a Task Oracle
  • A. Rios, L. Itti
  • Computer Science
    2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2020
TLDR
This work proposes and compares several candidate task-assigning mappers which require very little memory overhead and performs very close to a ground truth oracle, especially in experiments of inter-dataset task assignment.
Continual learning in recurrent neural networks
TLDR
This study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs.
Unifying Importance Based Regularisation Methods for Continual Learning
TLDR
Strong theoretical and empirical evidence is presented that, despite stem-ming from very different motivations, both SI and MAS approximate the square root of the Fisher Information, with the Fisher being the theoretically justified basis of EWC.
Continual Learning with Recursive Gradient Optimization
TLDR
Recursive Gradient Optimization (RGO) is composed of an iteratively updated optimizer that modifies the gradient to minimize forgetting without data replay and a virtual Feature Encoding Layer (FEL) that represents different network structures with only task descriptors.
Self-Net: Lifelong Learning via Continual Self-Modeling
TLDR
This work proposes a novel framework, Self-Net, that uses an autoencoder to learn a set of low-dimensional representations of the weights learned for different tasks, and is the first to use autoencoders to sequentially encode sets of network weights to enable continual learning.
ONLINE CONTINUAL LEARNING
TLDR
This work proposes a novel online continual learning method named “Contextual Transformation Networks” (CTN) to efficiently model the task-specific features while enjoying neglectable complexity overhead compared to other fixed architecture methods.
Supermasks in Superposition
TLDR
The Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting, is presented and it is found that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks.
Logarithmic Continual Learning
TLDR
The approach leverages allocation of past data in a set of generative models such that most of them do not require retraining after a task, and shows the superiority of the method with respect to the state-of-the-art generative rehearsal methods.
Economical ensembles with hypernetworks
TLDR
The proposed method generates an ensemble by randomly initializing an additional number of weight embeddings in the vicinity of each other, and exploits the inherent randomness in stochastic gradient descent to induce ensemble diversity.
...
...

References

SHOWING 1-10 OF 63 REFERENCES
Task Agnostic Continual Learning via Meta Learning
TLDR
This work proposes a framework specific for the scenario where no information about task boundaries or task identity is given, and proposes a separation of concerns into what task is being solved and how the task should be solved, which opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages.
Self-Net: Lifelong Learning via Continual Self-Modeling
TLDR
This work proposes a novel framework, Self-Net, that uses an autoencoder to learn a set of low-dimensional representations of the weights learned for different tasks, and is the first to use autoencoders to sequentially encode sets of network weights to enable continual learning.
Three scenarios for continual learning
TLDR
Three continual learning scenarios are described based on whether at test time task identity is provided and--in case it is not--whether it must be inferred, and it is found that regularization-based approaches fail and that replaying representations of previous experiences seems required for solving this scenario.
Overcoming catastrophic forgetting in neural networks
TLDR
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
Generative replay with feedback connections as a general strategy for continual learning
TLDR
This work reduced the computational cost of generative replay by integrating the generative model into the main model by equipping it with generative feedback or backward connections and believes this to be an important first step towards making the powerful technique ofGenerative replay scalable to real-world continual learning applications.
Overcoming catastrophic forgetting with hard attention to the task
TLDR
A task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning, and features the possibility to control both the stability and compactness of the learned knowledge, which makes it also attractive for online learning or network compression applications.
Continual Learning with Deep Generative Replay
TLDR
The Deep Generative Replay is proposed, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"), with only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task.
Experience Replay for Continual Learning
TLDR
This work shows that using experience replay buffers for all past events with a mixture of on- and off-policy learning can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities.
A Scalable Approach to Multi-Context Continual Learning via Lifelong Skill Encoding
TLDR
This work presents a scalable approach to multi-context continual learning (MCCL) in which it decouple how a system learns to solve new tasks (i.e., acquires skills) from how it stores them, and demonstrates the feasibility of encoding entire networks in order to facilitate efficient continual learning.
Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization
TLDR
This study proposes a neuroscience-inspired scheme, called “context-dependent gating,” in which mostly nonoverlapping sets of units are active for any one task, which allows ANNs to maintain high performance across large numbers of sequentially presented tasks, particularly when combined with weight stabilization.
...
...