• Corpus ID: 254044339

Continual Learning Beyond a Single Model

  title={Continual Learning Beyond a Single Model},
  author={Thang Van Doan and Seyed Iman Mirzadeh and Mehrdad Farajtabar},
A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles’ training and inference costs can increase signif-icantly as the number of models… 



Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint.

This paper provides a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task, which leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation.

Understanding the Role of Training Regimes in Continual Learning

This work hypothesizes that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting, and studies the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks'Local minima and consequently, on helping it not to forget catastrophically.

Continual Learning in Deep Networks: an Analysis of the Last Layer

It is shown that the best-performing type of output layer depends on the data distribution drifts and/or the amount of data available and a way of selecting the best type ofoutput layer for a given scenario is suggested.

Task-agnostic Continual Learning with Hybrid Probabilistic Models

This work proposes HCL, a Hybrid generativediscriminative approach to Continual Learning for classification that uses the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique.

Continual Learning with Deep Generative Replay

The Deep Generative Replay is proposed, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"), with only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task.

Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

This work derives the first generalisation guarantees for the algorithm OGD for continual learning, for overparameterized neural networks, and proves that it is robust to catastrophic forgetting across an arbitrary number of tasks, and that it verifies tighter generalisation bounds.

Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference

This work proposes a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples, and introduces a new algorithm, Meta-Experience Replay, that directly exploits this view by combining experience replay with optimization based meta-learning.

Orthogonal Gradient Descent for Continual Learning

The Orthogonal Gradient Descent (OGD) method is presented, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task.

Variational Continual Learning

Variational continual learning is developed, a simple but general framework for continual learning that fuses online variational inference and recent advances in Monte Carlo VI for neural networks that outperforms state-of-the-art continual learning methods.

Gradient Projection Memory for Continual Learning

This work proposes a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks, and shows that this induces minimum to no interference with thepast tasks, thereby mitigates forgetting.