• Corpus ID: 2157345

Overcoming catastrophic forgetting with hard attention to the task

@inproceedings{Serr2018OvercomingCF,
  title={Overcoming catastrophic forgetting with hard attention to the task},
  author={Joan Serr{\`a} and D{\'i}dac Sur{\'i}s and Marius Miron and Alexandros Karatzoglou},
  booktitle={ICML},
  year={2018}
}
Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and… 

Figures and Tables from this paper

Overcoming Catastrophic Forgetting with Self-adaptive Identifiers
TLDR
This paper proposes to utilize the variational Bayesian inference method to overcome catastrophic forgetting by pruning the neural network according to the mean and variance of weights, parameters are vastly reduced, which mitigates the storage problem of double parameters required in variationalBayesian inference.
Preempting Catastrophic Forgetting in Continual Learning Models by Anticipatory Regularization
  • A. E. Khatib, F. Karray
  • Computer Science
    2019 International Joint Conference on Neural Networks (IJCNN)
  • 2019
TLDR
It is shown that one way to achieve this is through an auxiliary unsupervised reconstruction loss that encourages the learned representations not only to be useful for solving the current classification task, but also to reflect the content of the data being processed—content that is generally richer than it is discriminative for any one task.
Few-Shot Self Reminder to Overcome Catastrophic Forgetting
TLDR
This work presents a simple yet surprisingly effective way of preventing catastrophic forgetting, called Few-shot Self Reminder (FSR), which regularizes the neural net from changing its learned behaviour by performing logit matching on selected samples kept in episodic memory from the old tasks.
Learning to Remember from a Multi-Task Teacher
TLDR
This paper argues that the outputs of neural networks are subject to rapid changes when learning a new data distribution, and networks that appear to "forget" everything still contain useful representation towards previous tasks, and proposes a novel meta-learning algorithm to overcome this issue.
Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping
TLDR
This work shows that RMNs learn an optimized representational overlap that overcomes the twin problem of catastrophic forgetting and remembering, and achieves state-of-the-art performance across many common continual learning benchmarks.
Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation
TLDR
This paper proposes a very different approach, called Parameter Generation and Model Adaptation (PGMA), to dealing with the problem of catastrophic forgetting in standard neural network architectures.
Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting
TLDR
By separating the explicit neural structure learning and the parameter estimation, the proposed method is capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catastrophic forgetting in experiments.
After Task 1 After Task 2 After Task 5
TLDR
This paper proposes a scalable functional-regularisation approach where the network weights are regularised only over a few memorable past examples that are crucial to avoid forgetting and opens a new direction for life-long learning where regularisation methods are naturally combined with memory-based methods.
Overcoming Catastrophic Forgetting Using Sparse Coding and Meta Learning
TLDR
This work proposes two main strategies to face the problem of task interference in convolutional neural networks, using a sparse coding technique to adaptively allocate model capacity to different tasks avoiding interference between them and a meta learning technique to foster knowledge transfer among tasks.
Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI
TLDR
This work proposes a novel training algorithm called training by explaining in which it leverage Layer-wise Relevance Propagation in order to retain the information a neural network has already learned in previous tasks when training on new data.
...
...

References

SHOWING 1-10 OF 48 REFERENCES
Overcoming Catastrophic Forgetting by Incremental Moment Matching
TLDR
IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively to make the search space of posterior parameter smooth.
Overcoming catastrophic forgetting in neural networks
TLDR
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
Less-forgetting Learning in Deep Neural Networks
TLDR
Surprisingly, the proposed less-forgetting learning method is very effective to forget less of the information in the source domain, and is helpful to improve the performance of deep neural networks in terms of recognition rates.
An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks
TLDR
It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.
Reduction of catastrophic forgetting with transfer learning and ternary output codes
TLDR
This work examines how training a neural net in accordance with latently learned output encodings drastically reduces catastrophic forgetting, which results in a technique that makes it easier rather than harder to learn new tasks while retaining existing knowledge.
Learning without Forgetting
TLDR
This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Improved multitask learning through synaptic intelligence
TLDR
This study introduces a model of intelligent synapses that accumulate task relevant information over time, and exploits this information to efficiently consolidate memories of old tasks to protect them from being overwritten as new tasks are learned.
Continual Learning with Deep Generative Replay
TLDR
The Deep Generative Replay is proposed, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"), with only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task.
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
  • Arun Mallya, S. Lazebnik
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This paper is able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task.
Gradient Episodic Memory for Continual Learning
TLDR
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.
...
...