• Corpus ID: 204955905

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

@inproceedings{Gupta2019RelayPL,
  title={Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning},
  author={Abhishek Gupta and Vikash Kumar and Corey Lynch and Sergey Levine and Karol Hausman},
  booktitle={CoRL},
  year={2019}
}
We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage that produces goal-conditioned hierarchical policies, and a reinforcement learning phase that finetunes these policies for task performance. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via… 

Figures and Tables from this paper

Learning a Skill-sequence-dependent Policy for Long-horizon Manipulation Tasks
TLDR
This paper proposes a skillsequence-dependent hierarchical policy for solving a typical long-horizon task that is significantly faster than Proximal Policy Optimization (PPO) and the task schema methods and makes the learning much more sample-efficient.
Learning to Reach Goals via Iterated Supervised Learning
TLDR
This paper proposes a simple algorithm in which an agent continually relabels and imitates the trajectories it generates to progressively learn goal- reaching behaviors from scratch, and formally shows that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrates improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks.
Learning To Reach Goals Without Reinforcement Learning
TLDR
A theoretical result linking self-supervised imitation learning and reinforcement learning, and empirical results showing that it performs competitively with more complex reinforcement learning methods on a range of challenging goal reaching problems, while yielding advantages in terms of stability and use of offline data.
Skill-based Meta-Reinforcement Learning
TLDR
Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
TLDR
This work proposes to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks and evaluates its approach on complex robotic navigation and manipulation tasks and shows that it outperforms existing methods by a large margin.
Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
TLDR
This paper proposes a hierarchical planning framework, consisting of a low-level goal-conditioned RL policy and a high- level goal planner, and adopts a Conditional Variational Autoencoder to sample meaningful high-dimensional sub-goal candidates and to solve the high-level long-term strategy optimization problem.
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning
TLDR
Learning from Guided Play is presented, a framework in which expert demonstrations of, in addition to a main task, multiple auxiliary tasks are Leveraged, and a hierarchical model is used to learn each task reward and policy through a modified AIL procedure.
FIRL: Fast Imitation and Policy Reuse Learning
TLDR
This work proposes FIRL, Fast (one-shot) Imitation, and Policy Reuse Learning, enabling fast learning based on the policy pool and reduces a complex task learning to a simple regression problem that it could solve in a few offline iterations.
Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning
TLDR
This work proposes a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors.
Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks
TLDR
EMBR is introduced, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks and can be directly combined with off-the-shelf symbolic planners to complete long-Horizon tasks.
...
...

References

SHOWING 1-10 OF 40 REFERENCES
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
TLDR
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Data-Efficient Hierarchical Reinforcement Learning
TLDR
This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.
Hierarchical Imitation and Reinforcement Learning
TLDR
This work proposes an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction and can incorporate different combinations of imitation learning and reinforcement learning at different levels, leading to dramatic reductions in both expert effort and cost of exploration.
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
TLDR
QT-Opt is introduced, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real- world grasping that generalizes to 96% grasp success on unseen objects.
Divide-and-Conquer Reinforcement Learning
TLDR
The results show that divide-and-conquer RL greatly outperforms conventional policy gradient methods on challenging grasping, manipulation, and locomotion tasks, and exceeds the performance of a variety of prior methods.
SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
TLDR
Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy, a hybrid of exploration and demonstration paradigms for robot learning.
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
TLDR
This work uses adversarial methods to learn joint reward-policy options using only observed expert states and shows significant performance increases in one-shot transfer learning.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
TLDR
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
...
...