# Reinforcement learning of motor skills in high dimensions: A path integral approach

@article{Theodorou2010ReinforcementLO, title={Reinforcement learning of motor skills in high dimensions: A path integral approach}, author={Evangelos A. Theodorou and Jonas Buchli and Stefan Schaal}, journal={2010 IEEE International Conference on Robotics and Automation}, year={2010}, pages={2397-2403} }

Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal…

## 269 Citations

Path integral reinforcement learning

- Computer Science
- 2011

A new reinforcement learning algorithm is derived, called Policy Improvement with Path Integrals (PI2), which is surprising simple and works as a black box learning system, i.e., without the need for manual parameter tuning.

Policy search for motor primitives in robotics

- Computer ScienceMachine Learning
- 2010

A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Policy search for imitation learning

- Computer Science
- 2015

Interestingly, the reinforcement learning framework might offer the tools to govern both learning methods at the same time, as the concept of probability-weighted averaging of policy roll-outs as seen in PI2 is combined with an optimization-based policy representation.

RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems

- Computer ScienceArXiv
- 2019

A novel neurobiologically inspired hierarchical learning framework, Reinforcement Learning Optimal Control, which operates on two levels of abstraction and utilises a reduced number of controllers to solve nonlinear systems with unknown dynamics in continuous state and action spaces.

Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search

- Computer Science
- 2014

This work presents a trajectory optimization algorithm suitable for use with guided policy search that does not require a known dynamics model or simulator, and shows that this approach can optimize manipulation trajectories that are extremely challenging for previous reinforcement learning methods.

Learning Motor Skills - From Algorithms to Robot Experiments

- Computer ScienceSpringer Tracts in Advanced Robotics
- 2014

This book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters and appropriate kernel-based reinforcement learning algorithms.

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

- Computer Science2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017

It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.

Combining reinforcement learning and optimal control for the control of nonlinear dynamical systems

- Computer Science
- 2015

This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions and demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems.

Variable Impedance Control - A Reinforcement Learning Approach

- Computer ScienceRobotics: Science and Systems
- 2010

This work investigates tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory, and shows that path integral based RL can be used not only for planning but also to derive variable gain feedback controllers in realistic scenarios.

Learning contact-rich manipulation skills with guided policy search

- Computer Science2015 IEEE International Conference on Robotics and Automation (ICRA)
- 2015

This paper extends a recently developed policy search method and uses it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations, and shows that this method can acquire fast, fluent behaviors after only minutes of interaction time.

## References

SHOWING 1-10 OF 50 REFERENCES

Reinforcement Learning for Parameterized Motor Primitives

- Computer ScienceThe 2006 IEEE International Joint Conference on Neural Network Proceedings
- 2006

This paper compares both established and novel algorithms for the gradient-based improvement of parameterized policies in the context of motor primitive learning, and shows that the most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude.

Learning Attractor Landscapes for Learning Motor Primitives

- Computer ScienceNIPS
- 2002

By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.

Reinforcement learning of motor skills with policy gradients

- Computer ScienceNeural Networks
- 2008

Policy Gradient Methods for Robotics

- Computer Science2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
- 2006

An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Learning model-free robot control by a Monte Carlo EM algorithm

- Computer ScienceAuton. Robots
- 2009

A Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories, related to, and generalizes, the PoWER algorithm of Kober and Peters.

A path integral approach to agent planning

- Computer Science

A class of non-linear stochastic control problems that can be efficiently solved using a path integral is discussed and it is shown that this solution can be computed more efficiently than in the discounted reward framework.

A stochastic reinforcement learning algorithm for learning real-valued functions

- Computer ScienceNeural Networks
- 1990

Efficient computation of optimal actions

- Computer ScienceProceedings of the National Academy of Sciences
- 2009

This work proposes a more structured formulation that greatly simplifies the construction of optimal control laws in both discrete and continuous domains, and enables computations that were not possible before.

Learning to Control in Operational Space

- Computer ScienceInt. J. Robotics Res.
- 2008

The proposed method works in the setting of learning resolved motion rate control on a real, physical Mitsubishi PA-10 medical robotics arm and demonstrates feasibility for complex high degree-of-freedom robots.