# Reinforcement Learning for Humanoid Robotics

@inproceedings{Peters2003ReinforcementLF, title={Reinforcement Learning for Humanoid Robotics}, author={Jan Peters and Sethu Vijayakumar and Stefan Schaal}, year={2003} }

Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. [] Key Method Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have…

## 381 Citations

### Policy search for motor primitives in robotics

- Computer ScienceMachine Learning
- 2010

A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

### Reinforcement learning for balancer embedded humanoid locomotion

- Computer Science2010 10th IEEE-RAS International Conference on Humanoid Robots
- 2010

A new learning-walking scheme where a humanoid robot is embedded with a primitive balancing controller for safety and the results demonstrate that non-hierarchical RL algorithms with the structured FA is much faster than the hierarchical RL algorithm.

### Policy Gradient Methods for Robotics

- Computer Science2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
- 2006

An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

### Survey of Model-Based Reinforcement Learning: Applications on Robotics

- Computer ScienceJ. Intell. Robotic Syst.
- 2017

It is argued that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded, and model- based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods.

### Scaling Reinforcement Learning Paradigms for Motor Control

- Computer Science
- 2003

This poster looks at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings, and proves that Kakade’s ‘average natural policy gradient’ is indeed the true natural gradient.

### Reinforcement learning of motor skills with policy gradients

- Computer ScienceNeural Networks
- 2008

### Towards a common implementation of reinforcement learning for multiple robotic tasks

- Computer ScienceExpert Syst. Appl.
- 2018

### Reinforcement Learning for Parameterized Motor Primitives

- Computer ScienceThe 2006 IEEE International Joint Conference on Neural Network Proceedings
- 2006

This paper compares both established and novel algorithms for the gradient-based improvement of parameterized policies in the context of motor primitive learning, and shows that the most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude.

### 22 Reinforcement Learning Algorithms In Humanoid Robotics

- Computer Science
- 2012

The field of biped locomotion is of special interest when human-like robots are concerned because it is as obvious as interesting that anthropomorphic biped robots are potentially capable to effectively move in all unstructured environments where humans do.

## References

SHOWING 1-10 OF 36 REFERENCES

### Learning Attractor Landscapes for Learning Motor Primitives

- Computer ScienceNIPS
- 2002

By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.

### Learning by Demonstration

- Education, Computer ScienceEncyclopedia of Machine Learning and Data Mining
- 1996

In an implementation of pole balancing on a complex anthropomorphic robot arm, it is demonstrated that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems.

### Gradient Descent for General Reinforcement Learning

- Computer ScienceNIPS
- 1998

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm.

### Policy Gradient Methods for Reinforcement Learning with Function Approximation

- Computer ScienceNIPS
- 1999

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

### Reinforcement learning for continuous action using stochastic gradient ascent

- Computer Science
- 1998

The proposed method is based on a stochastic gradient ascent with respect to the policy parameter space and does not require a model of the environment to be given or learned, it does not need to approximate the value function explicitly, and it is incremental, requiring only a constant amount of computation per step.

### Biped dynamic walking using reinforcement learning

- Computer ScienceRobotics Auton. Syst.
- 1997

### Simple statistical gradient-following algorithms for connectionist reinforcement learning

- Computer ScienceMachine Learning
- 2004

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

### Model-Free Least-Squares Policy Iteration

- Computer ScienceNIPS
- 2001

A new approach to reinforcement learning which combines least squares function approximation with policy iteration, which is model-free and completely off policy and an off-policy method which can use (or reuse) data collected from any source.

### Experiments with Infinite-Horizon, Policy-Gradient Estimation

- Computer ScienceJ. Artif. Intell. Res.
- 2001

Algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP) based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs.

### A Natural Policy Gradient

- Computer ScienceNIPS
- 2001

This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.