Reinforcement learning of motor skills in high dimensions: A path integral approach

@article{Theodorou2010ReinforcementLO,
  title={Reinforcement learning of motor skills in high dimensions: A path integral approach},
  author={Evangelos A. Theodorou and Jonas Buchli and Stefan Schaal},
  journal={2010 IEEE International Conference on Robotics and Automation},
  year={2010},
  pages={2397-2403}
}
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal… 

Figures and Tables from this paper

Path integral reinforcement learning
TLDR
A new reinforcement learning algorithm is derived, called Policy Improvement with Path Integrals (PI2), which is surprising simple and works as a black box learning system, i.e., without the need for manual parameter tuning.
Policy search for motor primitives in robotics
TLDR
A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.
Policy search for imitation learning
TLDR
Interestingly, the reinforcement learning framework might offer the tools to govern both learning methods at the same time, as the concept of probability-weighted averaging of policy roll-outs as seen in PI2 is combined with an optimization-based policy representation.
RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems
TLDR
A novel neurobiologically inspired hierarchical learning framework, Reinforcement Learning Optimal Control, which operates on two levels of abstraction and utilises a reduced number of controllers to solve nonlinear systems with unknown dynamics in continuous state and action spaces.
Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search
TLDR
This work presents a trajectory optimization algorithm suitable for use with guided policy search that does not require a known dynamics model or simulator, and shows that this approach can optimize manipulation trajectories that are extremely challenging for previous reinforcement learning methods.
Learning Motor Skills - From Algorithms to Robot Experiments
TLDR
This book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters and appropriate kernel-based reinforcement learning algorithms.
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
TLDR
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Combining reinforcement learning and optimal control for the control of nonlinear dynamical systems
TLDR
This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions and demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems.
Variable Impedance Control - A Reinforcement Learning Approach
TLDR
This work investigates tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory, and shows that path integral based RL can be used not only for planning but also to derive variable gain feedback controllers in realistic scenarios.
Learning contact-rich manipulation skills with guided policy search
TLDR
This paper extends a recently developed policy search method and uses it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations, and shows that this method can acquire fast, fluent behaviors after only minutes of interaction time.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 50 REFERENCES
Reinforcement Learning for Parameterized Motor Primitives
  • Jan Peters, S. Schaal
  • Computer Science
    The 2006 IEEE International Joint Conference on Neural Network Proceedings
  • 2006
TLDR
This paper compares both established and novel algorithms for the gradient-based improvement of parameterized policies in the context of motor primitive learning, and shows that the most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude.
Learning Attractor Landscapes for Learning Motor Primitives
TLDR
By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.
Policy Gradient Methods for Robotics
  • Jan Peters, S. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
TLDR
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.
Learning model-free robot control by a Monte Carlo EM algorithm
TLDR
A Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories, related to, and generalizes, the PoWER algorithm of Kober and Peters.
A path integral approach to agent planning
TLDR
A class of non-linear stochastic control problems that can be efficiently solved using a path integral is discussed and it is shown that this solution can be computed more efficiently than in the discounted reward framework.
Efficient computation of optimal actions
  • E. Todorov
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2009
TLDR
This work proposes a more structured formulation that greatly simplifies the construction of optimal control laws in both discrete and continuous domains, and enables computations that were not possible before.
Learning to Control in Operational Space
TLDR
The proposed method works in the setting of learning resolved motion rate control on a real, physical Mitsubishi PA-10 medical robotics arm and demonstrates feasibility for complex high degree-of-freedom robots.
...
1
2
3
4
5
...