# Natural Actor-Critic

@inproceedings{Peters2005NaturalA, title={Natural Actor-Critic}, author={Jan Peters and Sethu Vijayakumar and Stefan Schaal}, booktitle={ECML}, year={2005} }

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen… Expand

#### 813 Citations

Natural Actor-Critic

- Computer Science
- Neurocomputing
- 2008

It is shown that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor- Critic algorithms. Expand

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
- 2012

The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given. Expand

Incremental Natural Actor-Critic Algorithms

- Computer Science, Mathematics
- NIPS
- 2007

The results extend prior two-timescale convergence results for actor-critic methods by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor- Criterion methods by providing the first convergence proofs and the first fully incremental algorithms. Expand

Natural-Gradient Actor-Critic Algorithms

- 2007

We prove the convergence of four new reinforcement learning algorithms based on the actorcritic architecture, on function approximation, and on natural gradients. Reinforcement learning is a class of… Expand

Projected Natural Actor-Critic

- Computer Science, Mathematics
- NIPS
- 2013

This paper presents a principled algorithm for performing natural gradient descent over a constrained domain, which allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. Expand

Natural Actor – Crit ic Algorithms

- 2009

We present four new reinforcement learning algorithms based on actor–critic, function approximation, and natural gradient ideas, and we provide their convergence proofs. Actor–critic reinforcement… Expand

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

- Computer Science
- ECML/PKDD
- 2008

A new algorithm, fitted natural actor-critic(FNAC), is proposed that extends the work in [1] to allow for general function approximation and data reuse and combines the appealing features of both approaches while overcoming their main weaknesses. Expand

Efficient Model Learning Methods for Actor–Critic Control

- Computer Science, Medicine
- IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
- 2012

Two new actor-critic algorithms for reinforcement learning that learn a process model and a reference model which represents a desired behavior are proposed, from which desired control actions can be calculated using the inverse of the learned process model. Expand

A Generalized Natural Actor-Critic Algorithm

- Computer Science, Mathematics
- NIPS
- 2009

A generalized Natural Gradient that linearly interpolates the two FIMs is described and an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic (gNAC) algorithm is proposed. Expand

An Actor-Critic Algorithm With Second-Order Actor and Critic

- Mathematics, Computer Science
- IEEE Transactions on Automatic Control
- 2017

This paper develops an estimate of the (Hessian) matrix containing the second derivatives of the performance metric with respect to policy parameters and introduces a new second-order policy improvement method, which is compared with some existing algorithms in two applications and leads to significantly faster convergence. Expand

#### References

SHOWING 1-10 OF 33 REFERENCES

An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm

- Computer Science
- CIS
- 2005

An actor-critic type algorithm utilizing the RLS(recursive least-squares) method, which is one of the most efficient techniques for adaptive signal processing, together with natural policy gradient, showed better performance than the conventional stochastic gradient ascent algorithm. Expand

Policy Gradient Methods for Reinforcement Learning with Function Approximation

- Mathematics, Computer Science
- NIPS
- 1999

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

- Computer Science
- ESANN
- 2007

The most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algo-rithms by at least an order of magnitude and is demonstrated in the application of learning to hit a baseball with an anthropomorphic robot arm. Expand

Covariant Policy Search

- Mathematics, Computer Science
- IJCAI
- 2003

This work proposes a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller that leads to a covariant gradient ascent rule. Expand

Reinforcement Learning for Humanoid Robotics

- Computer Science
- 2003

This paper discusses different approaches of reinforcement learning in terms of their applicability in humanoid robotics, and demonstrates that ‘vanilla’ policy gradient methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Expand

Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

- Computer Science
- 2003

This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting including an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001. Expand

Gradient Descent for General Reinforcement Learning

- Computer Science
- NIPS
- 1998

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm. Expand

A Natural Policy Gradient

- Computer Science
- NIPS
- 2001

This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. Expand

Conditional random fields for multi-agent reinforcement learning

- Computer Science
- ICML '07
- 2007

This paper explores the use of CRFs in a class of temporal learning algorithms, namely policy-gradient reinforcement learning (RL), and shows how agents can communicate with each other to choose the optimal joint action. Expand

Natural Actor-Critic for Road Traffic Optimisation

- Computer Science
- NIPS
- 2006

A policy-gradient reinforcement learning approach is used to directly optimise the traffic signals, mapping currently deployed sensor observations to control signals and extending natural-actor critic approaches to work for distributed and online infinite-horizon problems. Expand