• Corpus ID: 236154773

Demonstration-Guided Reinforcement Learning with Learned Skills

@article{Pertsch2021DemonstrationGuidedRL,
  title={Demonstration-Guided Reinforcement Learning with Learned Skills},
  author={Karl Pertsch and Youngwoon Lee and Yue Wu and Joseph J. Lim},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.10253}
}
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator’s exact muscle movements. Naturally, such learning will be… 

Skill-based Meta-Reinforcement Learning

TLDR
Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.

Skill-based Model-based Reinforcement Learning

TLDR
A Ski ll-based Mo del-based RL framework ( SkiMo) is proposed that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than predicting all small details in the intermediate states, step by step.

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

TLDR
This work theoretically characterize why SAFER can enforce safe policy learning and demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, 2 in which SAFER outperforms state-of-the-art primitive learning methods in success and safety.

TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

TLDR
Empirical evidence is provided that TempoRL can leverage task-agnostic trajectories to accelerate learning and introduces state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks.

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

TLDR
This paper theoretically and empirically shows the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks, and applies this approach to a complex, robotic block stacking domain, unsolvable by baselines.

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

TLDR
This work proposes an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model, which exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how they are executed.

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

TLDR
The theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning, effectively reducing the need for large near-optimal expert datasets through the use of aux-iliary non-expert data.

FIRL: Fast Imitation and Policy Reuse Learning

TLDR
This work proposes FIRL, Fast (one-shot) Imitation, and Policy Reuse Learning, enabling fast learning based on the policy pool and reduces a complex task learning to a simple regression problem that it could solve in a few offline iterations.

Spectral Decomposition Representation for Reinforcement Learning

TLDR
This work proposes an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without in-ducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning.

Robot programming by demonstration with a monocular RGB camera

  • Kaimeng WangTe Tang
  • Physics
    Industrial Robot: the international journal of robotics research and application
  • 2022
Purpose This paper aims to present a new approach for robot programming by demonstration, which generates robot programs by tracking 6 dimensional (6D) pose of the demonstrator’s hand using a single

References

SHOWING 1-10 OF 49 REFERENCES

Overcoming Exploration in Reinforcement Learning with Demonstrations

TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.

Accelerating Reinforcement Learning with Learned Skill Priors

TLDR
This work proposes a deep latent variable model that jointly learns an embedding space of skills and the skill prior from offline agent experience, and extends common maximum-entropy RL approaches to use skill priors to guide downstream learning.

Deep Q-learning From Demonstrations

TLDR
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

TLDR
It is shown that even when the prior data does not actually succeed at solving the new task, it can still be utilized for learning a better policy, by providing the agent with a broader understanding of the mechanics of its environment.

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

TLDR
This work simplifies the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved.

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

TLDR
This work proposes a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent and trains end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities.

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

TLDR
This paper proposes a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and shows how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.

Accelerating Online Reinforcement Learning with Offline Datasets

TLDR
A novel algorithm is proposed that combines sample-efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of reinforcement learning policies.

Dynamics-Aware Unsupervised Discovery of Skills

TLDR
This work proposes an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics, and demonstrates that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, and substantially improves over prior hierarchical RL methods for unsuper supervised skill discovery.