Residual Policy Learning
@article{Silver2018ResidualPL, title={Residual Policy Learning}, author={Tom Silver and Kelsey R. Allen and Joshua B. Tenenbaum and Leslie Pack Kaelbling}, journal={ArXiv}, year={2018}, volume={abs/1812.06298} }
We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial…
99 Citations
Residual Policy Learning
- Computer Science
- 2020
It is found that learning an effective policy from residuals is faster than from scratch, and high success rates for pushing and pick-and-place are achieved, and results across methods are compared.
Improved Exploration with Stochastic Policies in Deep Reinforcement Learning
- Computer Science
- 2020
This thesis proposes to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner, and extends the Soft Actor-Critic algorithm with policy distributions of varying expressiveness.
Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control
- Computer ScienceArXiv
- 2022
The goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains and shows that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.
Sample-Efficient Learning for Industrial Assembly using Qgraph-bounded DDPG
- Computer Science2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2020
An in-depth comparison based on a large number of experiments is presented and the advantages and performance of Qgraph-bounded DDPG are demonstrated: the learning process can be significantly sped up, robustified against bad choices of hyperparameters and runs with less memory requirements.
Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics
- Computer ScienceCoRL
- 2022
This work proposes accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience, and proposes a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations.
A Comparison of Action Spaces for Learning Manipulation Tasks
- Psychology2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019
This paper compares learning performance across three tasks, four action spaces, and using two modern reinforcement learning algorithms to lend support to the hypothesis that learning references for a task-space impedance controller significantly reduces the number of samples needed to achieve good performance across all tasks and algorithms.
Soft Action Priors: Towards Robust Policy Transfer
- Computer ScienceArXiv
- 2022
This paper uses the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques and proposes a class of adaptive methods that can robustly ex- ploit action priors by combining reward shaping and auxiliary regularization losses.
Off-Policy Deep Reinforcement Learning without Exploration
- Computer ScienceICML
- 2019
This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
- Computer Science2021 IEEE International Conference on Robotics and Automation (ICRA)
- 2021
This work proposes a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model.
BRPO: Batch Residual Policy Optimization
- Computer ScienceIJCAI
- 2020
This work derives a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance, and shows that BRPO achieves the state-of-the-art performance in a number of tasks.
References
SHOWING 1-10 OF 39 REFERENCES
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
- Computer Science2017 IEEE International Conference on Robotics and Automation (ICRA)
- 2017
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Continuous Deep Q-Learning with Model-based Acceleration
- Computer ScienceICML
- 2016
This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Computer ScienceNeurIPS
- 2018
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Continuous control with deep reinforcement learning
- Computer ScienceICLR
- 2016
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
- Computer ScienceICLR
- 2018
Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world…
Gaussian Processes for Data-Efficient Learning in Robotics and Control
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- Computer ScienceAISTATS
- 2011
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.
Hindsight Experience Replay
- Computer ScienceNIPS
- 2017
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Residual Reinforcement Learning for Robot Control
- Computer Science2019 International Conference on Robotics and Automation (ICRA)
- 2019
This paper studies how to solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL.