• Corpus ID: 16153365

Generative Adversarial Imitation Learning

  title={Generative Adversarial Imitation Learning},
  author={Jonathan Ho and Stefano Ermon},
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement… 

Figures and Tables from this paper

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

This work proposes a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows to re-frame imitation learning within the standard reinforcement learning setting.

Multi-Agent Generative Adversarial Imitation Learning

This work proposes a new framework for multi-agent imitation learning for general Markov games, where a generalized notion of inverse reinforcement learning is built upon, and introduces a practical multi- agent actor-critic algorithm with good empirical performance.

Wasserstein Adversarial Imitation Learning

A natural connection is shown between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness) and proposes a novel approach called Wasserstein Adversarial Imitation Learning, which considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications.

A Bayesian Approach to Generative Adversarial Imitation Learning

This work proposes a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural networks and shows that it can significantly enhance the sample efficiency of GAIL leveraging the predictive density of the cost.

Adversarial Imitation via Variational Inverse Reinforcement Learning

The results show that the proposed empowerment-regularized maximum-entropy inverse reinforcement learning method not only learns near-optimal rewards and policies that are matching expert behavior but also performs significantly better than state-of-the-art inverse reinforcementlearning algorithms.

Off-Policy Adversarial Inverse Reinforcement Learning

An Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm is proposed which is sample efficient as well as gives good imitation performance compared to the state-of-the-art AIL algorithm in the continuous control tasks.

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).

Domain Adaptation for Imitation Learning Using Generative Adversarial Network

The model aims to learn both domain-shared and domain-specific features and utilizes it to find an optimal policy across domains and shows the effectiveness of the model in a number of tasks ranging from low to complex high-dimensional.

Generative Adversarial Self-Imitation Learning

GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics and can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function.

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

It is demonstrated that AIRL is able to recover reward functions that are robust to changes in dynamics, enabling us to learn policies even under significant variation in the environment seen during training.



Model-Free Imitation Learning with Policy Optimization

Under the apprenticeship learning formalism, this work develops alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert.

Efficient Reductions for Imitation Learning

This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Apprenticeship learning via inverse reinforcement learning

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

This work explores how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems and an efficient sample-based approximation for MaxEnt IOC.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

Continuous Inverse Optimal Control with Locally Optimal Examples

A probabilistic inverse optimal control algorithm that scales gracefully with task dimensionality, and is suitable for large, continuous domains where even computing a full policy is impractical.

Learning to search: Functional gradient techniques for imitation learning

The work presented extends the Maximum Margin Planning (MMP) framework to admit learning of more powerful, non-linear cost functions, and demonstrates practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation.