Data-Efficient Generalization of Robot Skills with Contextual Policy Search

@article{Kupcsik2013DataEfficientGO,
  title={Data-Efficient Generalization of Robot Skills with Contextual Policy Search},
  author={Andras Gabor Kupcsik and Marc Peter Deisenroth and Jan Peters and Gerhard Neumann},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2013}
}
In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level… 

Figures and Tables from this paper

A Survey on Policy Search for Robotics

TLDR
This work classifies model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and presents a unified view on existing algorithms.

Accounting for Task-Difficulty in Active Multi-Task Robot Control Learning

TLDR
This work proposes the novel approach PUBSVE for estimating a reward baseline and investigates empirically on benchmark problems and simulated robotic tasks to which extent this method can remedy the issue of non-comparable reward.

Learning Replanning Policies With Direct Policy Search

TLDR
This work proposes a framework to learn trajectory replanning policies via contextual policy search and demonstrates that they are safe for the robot, can be learned efficiently, and outperform non-replanning policies for problems with partially observable or perturbed context.

Active contextual policy search

TLDR
It is argued that there is a better way than selecting each task equally often because some tasks might be easier to learn at the beginning and the knowledge that the agent can extract from these tasks can be transferred to similar but more difficult tasks.

Hierarchical Relative Entropy Policy Search

TLDR
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies.

Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller

TLDR
The desired flexibility of the controller is achieved by applying the recently developed contextual relative entropy policy search(REPS) method, which can generalize the robot walking controller for different contexts, where a context is described by a real valued vector.

Gaussian Processes for Data-Efficient Learning in Robotics and Control

TLDR
This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.

Reactive, task-specific object manipulation by metric reinforcement learning

TLDR
This paper discusses that their system can learn and model various manipulation tasks such as pouring or reaching; and can successfully react to a wide range of perturbations introduced during task executions.
...

References

SHOWING 1-10 OF 22 REFERENCES

Hierarchical Relative Entropy Policy Search

TLDR
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies.

Policy search for motor primitives in robotics

TLDR
A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Using inaccurate models in reinforcement learning

TLDR
This paper presents a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials, and achieves near-optimal performance in the real system, even when the model is only approximate.

Reinforcement learning of motor skills in high dimensions: A path integral approach

TLDR
This paper derives a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals, and believes that this new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

TLDR
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.

Reinforcement Learning to Adjust Robot Movements to New Situations

TLDR
This paper describes how to learn such mappings from circumstances to meta-parameters using reinforcement learning, and uses a kernelized version of the reward-weighted regression to do so.

Learning Attractor Landscapes for Learning Motor Primitives

TLDR
By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

TLDR
It is demonstrated how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch.

Reinforcement learning of motor skills with policy gradients

Learning Movement Primitives

TLDR
A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, and demonstrates the different ingredients of the DMP approach in various examples.