• Corpus ID: 56327855

Residual Policy Learning

  title={Residual Policy Learning},
  author={Tom Silver and Kelsey R. Allen and Joshua B. Tenenbaum and Leslie Pack Kaelbling},
We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial… 

Figures from this paper

Residual Policy Learning

It is found that learning an effective policy from residuals is faster than from scratch, and high success rates for pushing and pick-and-place are achieved, and results across methods are compared.

Improved Exploration with Stochastic Policies in Deep Reinforcement Learning

This thesis proposes to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner, and extends the Soft Actor-Critic algorithm with policy distributions of varying expressiveness.

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

The goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains and shows that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.

Sample-Efficient Learning for Industrial Assembly using Qgraph-bounded DDPG

An in-depth comparison based on a large number of experiments is presented and the advantages and performance of Qgraph-bounded DDPG are demonstrated: the learning process can be significantly sped up, robustified against bad choices of hyperparameters and runs with less memory requirements.

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

This work proposes accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience, and proposes a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations.

A Comparison of Action Spaces for Learning Manipulation Tasks

This paper compares learning performance across three tasks, four action spaces, and using two modern reinforcement learning algorithms to lend support to the hypothesis that learning references for a task-space impedance controller significantly reduces the number of samples needed to achieve good performance across all tasks and algorithms.

Soft Action Priors: Towards Robust Policy Transfer

This paper uses the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques and proposes a class of adaptive methods that can robustly ex- ploit action priors by combining reward shaping and auxiliary regularization losses.

Off-Policy Deep Reinforcement Learning without Exploration

This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

This work proposes a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model.

BRPO: Batch Residual Policy Optimization

This work derives a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance, and shows that BRPO achieves the state-of-the-art performance in a number of tasks.



Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.

Continuous Deep Q-Learning with Model-based Acceleration

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world

Gaussian Processes for Data-Efficient Learning in Robotics and Control

This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.

Hindsight Experience Replay

A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.

Residual Reinforcement Learning for Robot Control

This paper studies how to solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL.