Natural Actor-Critic

@inproceedings{Peters2005NaturalA,
  title={Natural Actor-Critic},
  author={Jan Peters and Sethu Vijayakumar and Stefan Schaal},
  booktitle={ECML},
  year={2005}
}
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen… Expand
Natural Actor-Critic
TLDR
It is shown that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor- Critic algorithms. Expand
A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
TLDR
The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given. Expand
Incremental Natural Actor-Critic Algorithms
TLDR
The results extend prior two-timescale convergence results for actor-critic methods by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor- Criterion methods by providing the first convergence proofs and the first fully incremental algorithms. Expand
Natural-Gradient Actor-Critic Algorithms
We prove the convergence of four new reinforcement learning algorithms based on the actorcritic architecture, on function approximation, and on natural gradients. Reinforcement learning is a class ofExpand
Projected Natural Actor-Critic
TLDR
This paper presents a principled algorithm for performing natural gradient descent over a constrained domain, which allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. Expand
Natural Actor – Crit ic Algorithms
We present four new reinforcement learning algorithms based on actor–critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor–critic reinforcementExpand
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
TLDR
A new algorithm, fitted natural actor-critic(FNAC), is proposed that extends the work in [1] to allow for general function approximation and data reuse and combines the appealing features of both approaches while overcoming their main weaknesses. Expand
Efficient Model Learning Methods for Actor–Critic Control
TLDR
Two new actor-critic algorithms for reinforcement learning that learn a process model and a reference model which represents a desired behavior are proposed, from which desired control actions can be calculated using the inverse of the learned process model. Expand
A Generalized Natural Actor-Critic Algorithm
TLDR
A generalized Natural Gradient that linearly interpolates the two FIMs is described and an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm is proposed. Expand
An Actor-Critic Algorithm With Second-Order Actor and Critic
TLDR
This paper develops an estimate of the (Hessian) matrix containing the second derivatives of the performance metric with respect to policy parameters and introduces a new second-order policy improvement method, which is compared with some existing algorithms in two applications and leads to significantly faster convergence. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm
TLDR
An actor-critic type algorithm utilizing the RLS(recursive least-squares) method, which is one of the most efficient techniques for adaptive signal processing, together with natural policy gradient, showed better performance than the conventional stochastic gradient ascent algorithm. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
TLDR
The most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algo-rithms by at least an order of magnitude and is demonstrated in the application of learning to hit a baseball with an anthropomorphic robot arm. Expand
Covariant Policy Search
TLDR
This work proposes a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller that leads to a covariant gradient ascent rule. Expand
Reinforcement Learning for Humanoid Robotics
TLDR
This paper discusses different approaches of reinforcement learning in terms of their applicability in humanoid robotics, and demonstrates that ‘vanilla’ policy gradient methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Expand
Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
TLDR
This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting including an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001. Expand
Gradient Descent for General Reinforcement Learning
TLDR
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm. Expand
A Natural Policy Gradient
TLDR
This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. Expand
Conditional random fields for multi-agent reinforcement learning
TLDR
This paper explores the use of CRFs in a class of temporal learning algorithms, namely policy-gradient reinforcement learning (RL), and shows how agents can communicate with each other to choose the optimal joint action. Expand
Natural Actor-Critic for Road Traffic Optimisation
TLDR
A policy-gradient reinforcement learning approach is used to directly optimise the traffic signals, mapping currently deployed sensor observations to control signals and extending natural-actor critic approaches to work for distributed and online infinite-horizon problems. Expand
...
1
2
3
4
...