• Corpus ID: 3307812

Reinforcement Learning from Imperfect Demonstrations

@article{Gao2018ReinforcementLF,
title={Reinforcement Learning from Imperfect Demonstrations},
author={Yang Gao and Huazhe Xu and Ji Lin and Fisher Yu and Sergey Levine and Trevor Darrell},
journal={ArXiv},
year={2018},
volume={abs/1802.05313}
}
• Published 2 February 2018
• Computer Science
• ArXiv
Robust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received from the environment. These tasks have divergent losses which are difficult to jointly optimize and such methods can be very sensitive to noisy demonstrations. We propose a unified…
148 Citations

Figures from this paper

Reinforcement Learning with Supervision from Noisy Demonstrations
• Computer Science
ArXiv
• 2020
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations, and achieve higher performance in fewer iterations.
Demonstration actor critic
• Computer Science
Neurocomputing
• 2021
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
• Computer Science
ICML
• 2019
A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations.
Anomaly Guided Policy Learning from Imperfect Demonstrations
• Computer Science
AAMAS
• 2022
This work focuses on bridging the exploration and LfID problems in view of anomaly detection, and further proposes AGPO method to deal with these problems, and shows the superiority of AGPO in this scenario.
Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks
• Computer Science
ArXiv
• 2022
This work presents a new method, able to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm, based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation.
Self-Imitation Learning from Demonstrations
• Computer Science
ArXiv
• 2022
Self-Imitation Learning (SIL), a recent RL algorithm that exploits agent’s past good experience, is extended to the LfD setup by initializing its replay buffer with demonstrations and shows the superiority of SIL over existing L fD algorithms in settings of suboptimal demonstrations and sparse rewards.
Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
• Computer Science
2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)
• 2018
This paper proposes a model-based method-IRL-TAMER-by combining learning from demonstration via inverse reinforcement learning (IRL) and learning from human reward via the TAMER framework and suggests that although an agent learning via IRL can learn a useful value function indicating which states are good based on the demonstration, it cannot obtain an effective policy navigating to the goal state with one demonstration.
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
• Computer Science
ICLR
• 2020
This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
• Computer Science
2021 IEEE International Conference on Robotics and Automation (ICRA)
• 2021
This work proposes a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model.
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
• Computer Science
ICML
• 2021
A multi-task inverse reinforcement learning (IRL) algorithm is proposed, called inverse temporal difference learning (ITD), that learns shared state features, alongside peragent successor features and preference vectors, purely from demonstrations without reward labels.

References

SHOWING 1-10 OF 41 REFERENCES
Learning from Demonstrations for Real World Reinforcement Learning
• Computer Science
ArXiv
• 2017
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages this data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
Deep Q-learning From Demonstrations
• Computer Science
AAAI
• 2018
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
Reinforcement Learning with Unsupervised Auxiliary Tasks
• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Integrating reinforcement learning with human demonstrations of varying ability
• Education, Computer Science
AAMAS
• 2011
This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in
Exploration from Demonstration for Interactive Reinforcement Learning
• Computer Science
AAMAS
• 2016
This work presents a model-free policy-based approach called Exploration from Demonstration (EfD) that uses human demonstrations to guide search space exploration and shows how EfD scales to large problems and provides convergence speed-ups over traditional exploration and interactive learning methods.
Robust Imitation of Diverse Behaviors
• Computer Science
NIPS
• 2017
A new version of GAIL is developed that is much more robust than the purely-supervised controller, especially with few demonstrations, and avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not.
Learning from Limited Demonstrations
• Computer Science
NIPS
• 2013
This work proves an upper bound on the Bellman error of the estimate computed by APID at each iteration, and shows empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm, and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available.
Boosted Bellman Residual Minimization Handling Expert Demonstrations
• Computer Science
ECML/PKDD
• 2014
This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED) by proposing algorithms that leverage expert data to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a dataSet of fixed expert demonstrations.
Loss is its own Reward: Self-Supervision for Reinforcement Learning
• Computer Science, Psychology
ICLR
• 2017
This work considers a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses that offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward.