Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

  title={Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning},
  author={Marc Peter Deisenroth and Carl Edward Rasmussen and Dieter Fox},
  booktitle={Robotics: Science and Systems},
Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful… 

Figures and Tables from this paper

Evaluating techniques for learning a feedback controller for low-cost manipulators

This work introduces several model-based learning agents as mechanisms to control a noisy, low-cost robotic system and confirms the fidelity of the simulations is confirmed by application of GPDP to a physical system.

Gaussian Processes for Data-Efficient Learning in Robotics and Control

This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.

Policy search for learning robot control using sparse data

This paper investigates how model-based reinforcement learning, in particular the probabilistic inference for learning control method (Pilco), can be tailored to cope with the case of sparse data to speed up learning, and shows that by including prior knowledge, policy learning can be sped up in presence of sparseData.

Combining Model-Based and Model-Free Updates for Deep Reinforcement Learning

This work aims to develop a method in the context of a specific policy representation: time-varying linear-Gaussian controllers, which yields a general-purpose RL procedure with favorable stability and sample complexity compared to fully model-free deep RL methods.

On Training Flexible Robots using Deep Reinforcement Learning

This paper systematically study the efficacy of policy search methods using DRL in training flexible robots and indicates that DRL is successfully able to learn efficient and robust policies for complex tasks at various degrees of flexibility.

Learning Dexterous Manipulation Policies from Experience and Imitation

This work shows that local trajectory-based controllers for complex non-prehensile manipulation tasks can be constructed from surprisingly small amounts of training data, and collections of such controllers can be interpolated to form more global controllers.

Learning contact-rich manipulation skills with guided policy search

This paper extends a recently developed policy search method and uses it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations, and shows that this method can acquire fast, fluent behaviors after only minutes of interaction time.

Hierarchical Reinforcement Learning With Universal Policies for Multistep Robotic Manipulation

A unified hierarchical reinforcement learning framework, named Universal Option Framework (UOF), to enable the agent to learn varied outcomes in multistep tasks more efficiently and stably, and with significantly less memory consumption.

Learning in Robotics using Bayesian Nonparametrics

This paper provides evidence that explicitly averaging out model uncertainties during planning and decision making is the key to success and speeds up learning by an order of magnitude for a benchmark problem.

Hierarchical Reinforcement Learning with Universal Policies for Multi-Step Robotic Manipulation

A hierarchical reinforcement learning framework, named Universal Option Framework (UOF), to enable the agent to learn varied outcomes in multi-step tasks more efficiently and stably, and withSignificantly less memory consumption is developed.



Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning

It is demonstrated how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials—from scratch.

Policy Gradient Methods for Robotics

  • Jan PetersS. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Learning by Demonstration

  • S. Schaal
  • Education, Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1996
In an implementation of pole balancing on a complex anthropomorphic robot arm, it is demonstrated that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems.

Closing the learning-planning loop with predictive state representations

A novel algorithm is proposed which provably learns a compact, accurate model directly from sequences of action-observation pairs, and is evaluated in a simulated, vision-based mobile robot planning task, showing that the learned PSR captures the essential features of the environment and enables successful and efficient planning.

Policy search for motor primitives in robotics

A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Model learning for robot control: a survey

This paper surveys the progress in model learning with a strong focus on robot control on a kinematic as well as dynamical level and deduces future directions of real-time learning algorithms.

Autonomous helicopter control using reinforcement learning policy search methods

  • J. BagnellJ. Schneider
  • Computer Science
    Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)
  • 2001
This work considers algorithms that evaluate and synthesize controllers under distributions of Markovian models and demonstrates the presented learning control algorithm by flying an autonomous helicopter and shows that the controller learned is robust and delivers good performance in this real-world domain.

Exploration and apprenticeship learning in reinforcement learning

This paper considers the apprenticeship learning setting in which a teacher demonstration of the task is available, and shows that, given the initial demonstration, no explicit exploration is necessary, and the student can attain near-optimal performance simply by repeatedly executing "exploitation policies" that try to maximize rewards.

Combining motion planning and optimization for flexible robot manipulation

A task-space probabilistic planner which solves general manipulation tasks posed as optimization criteria and is validated in simulation and on a 7-DOF robot arm that executes several tabletop manipulation tasks.