• Corpus ID: 6820915

Reinforcement Learning for Humanoid Robotics

  title={Reinforcement Learning for Humanoid Robotics},
  author={Jan Peters and Sethu Vijayakumar and Stefan Schaal},
Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. [] Key Method Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have…

Figures from this paper

Policy search for motor primitives in robotics

A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Reinforcement learning for balancer embedded humanoid locomotion

A new learning-walking scheme where a humanoid robot is embedded with a primitive balancing controller for safety and the results demonstrate that non-hierarchical RL algorithms with the structured FA is much faster than the hierarchical RL algorithm.

Policy Gradient Methods for Robotics

  • Jan PetersS. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Survey of Model-Based Reinforcement Learning: Applications on Robotics

It is argued that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded, and model- based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods.

Scaling Reinforcement Learning Paradigms for Motor Control

This poster looks at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings, and proves that Kakade’s ‘average natural policy gradient’ is indeed the true natural gradient.

Reinforcement learning of motor skills with policy gradients

Reinforcement Learning for Parameterized Motor Primitives

  • Jan PetersS. Schaal
  • Computer Science
    The 2006 IEEE International Joint Conference on Neural Network Proceedings
  • 2006
This paper compares both established and novel algorithms for the gradient-based improvement of parameterized policies in the context of motor primitive learning, and shows that the most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude.

22 Reinforcement Learning Algorithms In Humanoid Robotics

The field of biped locomotion is of special interest when human-like robots are concerned because it is as obvious as interesting that anthropomorphic biped robots are potentially capable to effectively move in all unstructured environments where humans do.



Learning Attractor Landscapes for Learning Motor Primitives

By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.

Learning by Demonstration

  • S. Schaal
  • Education, Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1996
In an implementation of pole balancing on a complex anthropomorphic robot arm, it is demonstrated that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems.

Gradient Descent for General Reinforcement Learning

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm.

Policy Gradient Methods for Reinforcement Learning with Function Approximation

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

Reinforcement learning for continuous action using stochastic gradient ascent

The proposed method is based on a stochastic gradient ascent with respect to the policy parameter space and does not require a model of the environment to be given or learned, it does not need to approximate the value function explicitly, and it is incremental, requiring only a constant amount of computation per step.

Biped dynamic walking using reinforcement learning

Simple statistical gradient-following algorithms for connectionist reinforcement learning

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

Model-Free Least-Squares Policy Iteration

A new approach to reinforcement learning which combines least squares function approximation with policy iteration, which is model-free and completely off policy and an off-policy method which can use (or reuse) data collected from any source.

Experiments with Infinite-Horizon, Policy-Gradient Estimation

Algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP) based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs.

A Natural Policy Gradient

This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.