Natural Way to Overcome the Catastrophic Forgetting in Neural Networks

@article{Kutalev2020NaturalWT,
  title={Natural Way to Overcome the Catastrophic Forgetting in Neural Networks},
  author={Alexey Kutalev},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.07107}
}
Not so long ago, a method was discovered that successfully overcomes the catastrophic forgetting of neural networks. Although we know about the cases of using this method to preserve skills when adapting pre-trained networks to particular tasks, it has not yet obtained widespread distribution. In this paper, we would like to propose an alternative method of overcoming catastrophic forgetting based on the total absolute signal passing through each connection in the network. This method has a… 

Figures from this paper

Stabilizing Elastic Weight Consolidation method in practical ML tasks and using weight importances for neural network pruning

The proposed stabilization approach for the EWC method performs on the task of maintaining skills during continual learning no worse than the original EWC, but does not have its disadvantages.

Empirical investigations on WVA structural issues

The issues of applying the WVA method to gradients or optimization steps of weights, choosing the optimal attenuation function in this method, as well as choose the optimal hyper-parameters of the method depending on the number of tasks in sequential training of neural networks.

An Appraisal of Incremental Learning Methods

It is concluded that incremental learning is still a hot research area and will be for a long period and more attention should be paid to the exploration of both biological systems and computational models.

References

SHOWING 1-10 OF 12 REFERENCES

Overcoming catastrophic forgetting in neural networks

Enabling Continual Learning with Differentiable Hebbian Plasticity

A Differentiable Hebbian Consolidation model which is composed of a DHP Softmax layer that adds a rapid learning plastic component to the fixed parameters of the softmax output layer; enabling learned representations to be retained for a longer timescale is proposed.

Control of Local Protein Synthesis and Initial Events in Myelination by Action Potentials

This work finds that release of glutamate from synaptic vesicles along axons of mouse dorsal root ganglion neurons in culture promotes myelin induction by stimulating formation of cholesterol-rich signaling domains between oligodendrocytes and axons, and increasing local synthesis of the major protein in the myelin sheath, myelin basic protein, through Fyn kinase-dependent signaling.

Memory Aware Synapses: Learning what (not) to forget

This paper argues that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively and proposes a novel approach for lifelong learning, coined Memory Aware Synapses (MAS), which computes the importance of the parameters of a neural network in an unsupervised and online manner.

Note on the quadratic penalties in elastic weight consolidation

  • Ferenc Huszár
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2018
It is shown that the quadratic penalties in EWC are inconsistent with this derivation and might lead to double-counting data from earlier tasks.

Continual Learning Through Synaptic Intelligence

This study introduces intelligent synapses that bring some of this biological complexity into artificial neural networks, and shows that it dramatically reduces forgetting while maintaining computational efficiency.

Catastrophic forgetting in connectionist networks

  • R. French
  • Computer Science
    Trends in Cognitive Sciences
  • 1999

Prolonged myelination in human neocortical evolution

Comparisons between chimpanzees and humans suggest that the human-specific shift in the timing of cortical maturation during adolescence may have implications for vulnerability to certain psychiatric disorders.

An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.