# Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks

@article{Schmidhuber1992LearningTC, title={Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks}, author={J{\"u}rgen Schmidhuber}, journal={Neural Computation}, year={1992}, volume={4}, pages={131-139} }

Previous algorithms for supervised sequence learning are based on dynamic recurrent networks. This paper describes an alternative class of gradient-based systems consisting of two feedforward nets that learn to deal with temporal sequences using fast weights: The first net learns to produce context-dependent weight changes for the second net whose weights may vary very quickly. The method offers the potential for STM storage efficiency: A single weight (instead of a full-fledged unit) may be…

## 310 Citations

### GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL

- Computer Science
- 2018

This work improves previous end-to-end differentiable neural networks with fast weight memories by trained on a complex sequence to sequence variation of the Associative Retrieval Problem with roughly 70 times more temporal memory than similar-sized standard recurrent NNs.

### Using Fast Weights to Attend to the Recent Past

- Computer ScienceNIPS
- 2016

These ``fast weights'' can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proven helpful in sequence-to-sequence models.

### Continual learning in recurrent neural networks

- Computer ScienceICLR
- 2021

This study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs.

### Learning Associative Inference Using Fast Weight Memory

- Computer ScienceICLR
- 2021

This model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems, meta-reinforcement-learning for POMDPs, and small-scale word-level language modelling.

### Learning Unambiguous Reduced Sequence Descriptions

- Computer ScienceNIPS
- 1991

Experiments show that systems based on these principles can require less computation per time step and many fewer training sequences than conventional training algorithms for recurrent nets.

### Continual learning with hypernetworks

- Computer ScienceICLR
- 2020

Insight is provided into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and it is shown that task-conditioned hypernetworks demonstrate transfer learning.

### Learning Complex, Extended Sequences Using the Principle of History Compression

- Computer ScienceNeural Computation
- 1992

A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences.

### Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

- Computer ScienceArXiv
- 2020

This work augments a deep neural network with a layer-specific fast-weight memory, generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation.

### Metalearned Neural Memory

- Computer ScienceNeurIPS
- 2019

This work augments recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning and achieves strong performance on a variety of learning problems, from supervised question answering to reinforcement learning.

### Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory Controllers

- Computer ScienceArXiv
- 2020

This work proposes to decouple the learning process of the NMN controllers to allow them to achieve flexible, rapid adaptation in the presence of new information, which is highly beneficial for meta-learning tasks where the memory controllers must quickly grasp abstract concepts in the target domain, and adapt stored knowledge.

## References

SHOWING 1-10 OF 22 REFERENCES

### A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks

- Computer Science
- 1989

This paper proposes a parallel on-line learning algorithms which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are ‘hidden in time’.

### Experimental Analysis of the Real-time Recurrent Learning Algorithm

- Computer Science
- 1989

A series of simulation experiments are used to investigate the power and properties of the real-time recurrent learning algorithm, a gradient-following learning algorithm for completely recurrent networks running in continually sampled time.

### A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks

- Computer ScienceNeural Computation
- 1992

A method suited for on-line learning that computes exactly the same gradient and requires fixed-size storage of the same order but has an average time complexity per time step of O(n3).

### Learning Algorithms for Networks with Internal and External Feedback

- Computer Science
- 1990

This paper gives an overview of some novel algorithms for reinforcement learning in non-stationary possibly reactive environments and critisizes methods based on system identiication and adaptive critics, and describes an adaptive subgoal generator.

### Generalization of backpropagation with application to a recurrent gas market model

- MathematicsNeural Networks
- 1988

### Learning to generate subgoals for action sequences

- BusinessIJCNN-91-Seattle International Joint Conference on Neural Networks
- 1991

The author discusses a system which solves at least one problem associated with compositional learning with the help of 'time-bridging' adaptive models that predict the effects of the system's subprograms.

### Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem

- Mathematics
- 1990

The improvement of the invention comprises a deflector fixed rearwardly of the sensor to prevent excessive displacement by individual stalks which may be out-of-place and thereby preclude an excessive response of the automatic steering system because of occasional out- of-place stalks.

### Task modularization by network modulation

- Proceedings of Neuro - Nimes ‘ 90 ,
- 1990