• Publications
  • Influence
Learning internal representations by error propagation
This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion
Learning representations by back-propagating errors
TLDR
Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain. Expand
Simple statistical gradient-following algorithms for connectionist reinforcement learning
TLDR
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates. Expand
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shownExpand
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporalExpand
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
A novel variant of the familiar backpropagation-through-time approach to training recurrent networks is described. This algorithm is intended to be used on arbitrary recurrent networks that runExpand
Incremental multi-step Q-learning
TLDR
A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. Expand
Experimental Analysis of the Real-time Recurrent Learning Algorithm
TLDR
A series of simulation experiments are used to investigate the power and properties of the real-time recurrent learning algorithm, a gradient-following learning algorithm for completely recurrent networks running in continually sampled time. Expand
...
1
2
3
4
...