Imitation Learning by Reinforcement Learning
@article{Ciosek2021ImitationLB, title={Imitation Learning by Reinforcement Learning}, author={Kamil Ciosek}, journal={ArXiv}, year={2021}, volume={abs/2108.04763} }
Imitation Learning algorithms learn a policy from demonstrations of expert behavior. Somewhat counterintuitively, we show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning, which is commonly considered more difficult. We conduct experiments which confirm that our reduction works well in practice for a continuous control task.
2 Citations
Backward Curriculum Reinforcement Learning
- Computer Science
- 2022
This work proposes novel reverse curriculum reinforcement learning, which starts training the agent using the backward trajectory of the episode rather than the original forward trajectory, so the agent can learn in a more sample-efficient man-ner.
Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control
- Computer Science, MathematicsIFAC-PapersOnLine
- 2022
By incorporating dynamic momentum in the algorithm, this work is able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques.
References
SHOWING 1-10 OF 22 REFERENCES
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
- Computer ScienceICML
- 2019
This work proposes a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows to re-frame imitation learning within the standard reinforcement learning setting.
Generative Adversarial Imitation Learning
- Computer ScienceNIPS
- 2016
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
- Computer ScienceICLR
- 2020
This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).
Imitation Learning via Kernel Mean Embedding
- Computer ScienceAAAI
- 2018
This work shows that the kernelization of a classical algorithm naturally reduces the imitation learning to a distribution learning problem, where the imitation policy tries to match the state-action visitation distribution of the expert.
Provably Efficient Imitation Learning from Observation Alone
- Computer ScienceICML
- 2019
FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations.
Efficient Reductions for Imitation Learning
- Computer ScienceAISTATS
- 2010
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.
Reward learning from human preferences and demonstrations in Atari
- Computer Science, PsychologyNeurIPS
- 2018
This work trains a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games and achieves strictly superhuman performance on 2 games without using game rewards.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- Computer ScienceAISTATS
- 2011
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
It is described how consumer-grade Virtual Reality headsets and hand tracking hardware can be used to naturally teleoperate robots to perform complex tasks and how imitation learning can learn deep neural network policies that can acquire the demonstrated skills.
A Divergence Minimization Perspective on Imitation Learning Methods
- Computer ScienceCoRL
- 2019
A unified probabilistic perspective on IL algorithms based on divergence minimization is presented, conclusively identifying that IRL's state-marginal matching objective contributes most to its superior performance, and applies the new understanding of IL methods to the problem of state-Marginal matching.