• Corpus ID: 236965836

Imitation Learning by Reinforcement Learning

  title={Imitation Learning by Reinforcement Learning},
  author={Kamil Ciosek},
  • K. Ciosek
  • Published 10 August 2021
  • Computer Science
  • ArXiv
Imitation Learning algorithms learn a policy from demonstrations of expert behavior. Somewhat counterintuitively, we show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning, which is commonly considered more difficult. We conduct experiments which confirm that our reduction works well in practice for a continuous control task. 

Figures from this paper

Backward Curriculum Reinforcement Learning

This work proposes novel reverse curriculum reinforcement learning, which starts training the agent using the backward trajectory of the episode rather than the original forward trajectory, so the agent can learn in a more sample-efficient man-ner.

Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control

By incorporating dynamic momentum in the algorithm, this work is able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques.



Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

This work proposes a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows to re-frame imitation learning within the standard reinforcement learning setting.

Generative Adversarial Imitation Learning

A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).

Imitation Learning via Kernel Mean Embedding

This work shows that the kernelization of a classical algorithm naturally reduces the imitation learning to a distribution learning problem, where the imitation policy tries to match the state-action visitation distribution of the expert.

Provably Efficient Imitation Learning from Observation Alone

FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations.

Efficient Reductions for Imitation Learning

This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.

Reward learning from human preferences and demonstrations in Atari

This work trains a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games and achieves strictly superhuman performance on 2 games without using game rewards.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

It is described how consumer-grade Virtual Reality headsets and hand tracking hardware can be used to naturally teleoperate robots to perform complex tasks and how imitation learning can learn deep neural network policies that can acquire the demonstrated skills.

A Divergence Minimization Perspective on Imitation Learning Methods

A unified probabilistic perspective on IL algorithms based on divergence minimization is presented, conclusively identifying that IRL's state-marginal matching objective contributes most to its superior performance, and applies the new understanding of IL methods to the problem of state-Marginal matching.