# Simple statistical gradient-following algorithms for connectionist reinforcement learning

@article{Williams2004SimpleSG, title={Simple statistical gradient-following algorithms for connectionist reinforcement learning}, author={Ronald J. Williams}, journal={Machine Learning}, year={2004}, volume={8}, pages={229-256} }

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. [...] Key Method Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional… Expand

#### Topics from this paper

#### 4,148 Citations

Local online learning in recurrent networks with random feedback

- Computer Science, Biology
- 2018

This work derives an approximation to gradient-based learning that comports with known biological features of the brain, such as causality and locality, and proposes an augmented circuit architecture that allows the RNN to concatenate short-duration patterns into sequences of longer duration. Expand

Learning to solve the credit assignment problem

- Computer Science, Biology
- ICLR
- 2020

A hybrid learning approach that learns to approximate the gradient, and can match or the performance of exact gradient-based learning in both feedforward and convolutional networks. Expand

An Alternative to Backpropagation in Deep Reinforcement Learning

- Computer Science
- ArXiv
- 2020

An algorithm called MAP propagation is proposed that can reduce this variance significantly while retaining the local property of learning rule and can solve common reinforcement learning tasks at a speed similar to that of backpropagation when applied to an actor-critic network. Expand

Improved Stochastic Synapse Reinforcement Learning for Continuous Actions in Sharply Changing Environments

- 2020 International Joint Conference on Neural Networks (IJCNN)
- 2020

Reinforcement learning in continuous action spaces requires mechanisms that allow for exploration of infinite possible actions. One challenging issue in such systems is the amount of exploration… Expand

Improved Stochastic Synapse Reinforcement Learning for Continuous Actions in Sharply Changing Environments

- Computer Science
- IJCNN
- 2020

Novel combinations of these equations outperform either set of equations alone in terms of both learning rate and consistency in a set of multidimensional robot inverse kinematics problems. Expand

Local online learning in recurrent networks with random feedback

- Computer Science, Medicine
- eLife
- 2019

An approximation to gradient-based learning is derived that comports with biological features of the brain, such as causality and locality, by requiring synaptic weight updates to depend only on local information about pre- and postsynaptic activities. Expand

Global Reinforcement Learning in Neural Networks with Stochastic Synapses

- Computer Science
- The 2006 IEEE International Joint Conference on Neural Network Proceedings
- 2006

This formulation of the REINFORCE learning principle has enabled the principle to apply to global reinforcement learning in networks with deterministic neural cells but stochastic synapses, and to suggest two groups of new learning rules for such networks, including simple local rules. Expand

Global Reinforcement Learning in Neural Networks

- Computer Science, Medicine
- IEEE Transactions on Neural Networks
- 2007

Numerical simulations have shown that for simple classification and reinforcement learning tasks, at least one family of the new learning rules gives results comparable to those provided by the famous Rules Ar-i and Ar-p for the Boltzmann machines. Expand

Gradient estimation in dendritic reinforcement learning

- Computer Science, Medicine
- Journal of mathematical neuroscience
- 2012

It is suggested that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability. Expand

Reinforcement Learning: An Introduction

- Computer Science
- IEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand

#### References

SHOWING 1-10 OF 61 REFERENCES

Function Optimization using Connectionist Reinforcement Learning Algorithms

- Computer Science
- 1991

One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. Expand

On the use of backpropagation in associative reinforcement learning

- Computer Science
- IEEE 1988 International Conference on Neural Networks
- 1988

A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks and it is observed that such an approach even permits a seamless blend of associatives reinforcement learning and supervised learning within the same network. Expand

A stochastic reinforcement learning algorithm for learning real-valued functions

- Computer Science
- Neural Networks
- 1990

A stochastic reinforcement learning algorithm for learning functions with continuous outputs using a connectionist network that learns to perform an underconstrained positioning task using a simulated 3 degree-of-freedom robot arm. Expand

Learning by statistical cooperation of self-interested neuron-like computing elements.

- Computer Science, Medicine
- Human neurobiology
- 1985

It is argued that some of the longstanding problems concerning adaptation and learning by networks might be solvable by this form of cooperativity, and computer simulation experiments are described that show how networks of self-interested components that are sufficiently robust can solve rather difficult learning problems. Expand

Neuronlike adaptive elements that can solve difficult learning control problems

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics
- 1983

It is shown how a system consisting of two neuronlike adaptive elements can solve a difficult learning control problem and the relation of this work to classical and instrumental conditioning in animal learning studies and its possible implications for research in the neurosciences. Expand

Pattern-recognizing stochastic learning automata

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics
- 1985

A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associative reinforcement… Expand

A new approach to the design of reinforcement schemes for learning automata

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics
- 1985

The generality of this method of designing learning schemes is pointed out, and it is shown that a very minor modification will enable the algorithm to learn in a multiteacher environment as well. Expand

Forward Models: Supervised Learning with a Distal Teacher

- Psychology, Computer Science
- Cogn. Sci.
- 1992

This article demonstrates that certain classical problems associated with the notion of the “teacher” in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system. Expand

Learning and Sequential Decision Making

- Computer Science
- 1989

It is shown how a TD METHOD can beunderstood as a NOVEL SYNTHESIS of CONCEPTS from the theORY of STOCHASTIC DYNAMIC PROGRAMMING, which is the standard method for solving decision-making problems in binary systems. Expand

Associative search network: A reinforcement learning associative memory

- Computer Science
- Biological Cybernetics
- 2004

An associative memory system is presented which does not require a “teacher” to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an… Expand