Revisiting Distillation and Incremental Classifier Learning

@inproceedings{Javed2018RevisitingDA,
  title={Revisiting Distillation and Incremental Classifier Learning},
  author={Khurram Javed and Faisal Shafait},
  booktitle={ACCV},
  year={2018}
}
One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. [] Key Method To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key…
A Simple Class Decision Balancing for Incremental Learning
TLDR
This scheme, dubbed as SS-IL, is shown to give much more balanced class decisions, have much less biased scores, and outperform strong state-of-the-art baselines on several large-scale benchmark datasets, without any sophisticated post-processing of the scores.
An Appraisal of Incremental Learning Methods
TLDR
It is concluded that incremental learning is still a hot research area and will be for a long period and more attention should be paid to the exploration of both biological systems and computational models.
Learning a Unified Classifier Incrementally via Rebalancing
TLDR
This work develops a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly, and incorporates three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance.
IL2M: Class Incremental Learning With Dual Memory
TLDR
This paper presents a class incremental learning method which exploits fine tuning and a dual memory to reduce the negative effect of catastrophic forgetting in image recognition and shows that the proposed approach is more effective than a range of competitive state-of-the-art methods.
ScaIL: Classifier Weights Scaling for Class Incremental Learning
TLDR
This work proposes simple but efficient scaling of past classifiers’ weights to make them more comparable to those of new classes and questions the utility of the widely used distillation loss component of incremental learning algorithms by comparing it to vanilla fine tuning in presence of a bounded memory.
SS-IL: Separated Softmax for Incremental Learning
TLDR
This work proposes a new method, dubbed as Separated Softmax for Incremental Learning (SS-IL), that consists of separated softmax (SS) output layer combined with task-wise knowledge distillation (TKD) to resolve score bias in class incremental learning.
Confidence Calibration for Incremental Learning
TLDR
This work proposes a simple yet effective learning objective to balance the confidence of classes of old tasks and new task in the class incremental learning setup and compares various sample memory configuring strategies and proposes a novel sample memory management policy to alleviate the forgetting.
KABI: Class-Incremental Learning via knowledge Amalgamation and Batch Identification
TLDR
This work proposes a class-incremental learning approach with knowledge amalgamation and batch identification (KABI), which can effectively alleviate catastrophic forgetting and finds that incremental models trained using knowledge distillation are skilled at discriminating classes within a batch.
Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild
TLDR
This work designs a novel class-incremental learning scheme with a new distillation loss, termed global distillation, a learning strategy to avoid overfitting to the most recent task, and a confidence-based sampling method to effectively leverage unlabeled external data.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
A Strategy for an Uncompromising Incremental Learner
TLDR
This article designs a strategy involving generative models and the distillation of dark knowledge as a means of hallucinating data along with appropriate targets from past distributions, and shows that phantom sampling helps avoid catastrophic forgetting during incremental learning.
Overcoming catastrophic forgetting in neural networks
TLDR
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
Incremental Classifier Learning with Generative Adversarial Networks
TLDR
This paper proposes a new loss function to combine the cross-entropy loss and distillation loss, a simple way to estimate and remove the unbalance between the old and new classes, and uses Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation that have much less privacy issues than real images because GANs do not directly copy any real image patches.
Encoder Based Lifelong Learning
TLDR
A new lifelong learning solution where a single model is trained for a sequence of tasks, aimed at preserving the knowledge of the previous tasks while learning a new one by using autoencoders.
iCaRL: Incremental Classifier and Representation Learning
TLDR
iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail, and distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures.
Gradient Episodic Memory for Continual Learning
TLDR
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.
An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks
TLDR
It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.
Learning without Forgetting
TLDR
This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Catastrophic forgetting in connectionist networks
  • R. French
  • Computer Science
    Trends in Cognitive Sciences
  • 1999
...
...