• Corpus ID: 226290142

f-IRL: Inverse Reinforcement Learning via State Marginal Matching

@inproceedings{Ni2020fIRLIR,
  title={f-IRL: Inverse Reinforcement Learning via State Marginal Matching},
  author={Tianwei Ni and Harshit S. Sikchi and Yufei Wang and Tejus Gupta and Lisa Lee and Benjamin Eysenbach},
  booktitle={CoRL},
  year={2020}
}
Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, f-IRL, that recovers a… 
OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
TLDR
Off-Policy Inverse Reinforcement Learning (OPIRL) is presented, which adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment and learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics.
General Characterization of Agents by States they Visit
TLDR
This work discusses and demonstrates howavioural characterizations of decision-making agents can be misleading, especially in stochastic environments, and proposes a novel solution based on what states policies visit, and runs experiments to evaluate the quality of the proposed BC against baselines.
Adversarial Intrinsic Motivation for Reinforcement Learning
TLDR
A quasimetric specific to Markov Decision Processes (MDPs) is introduced and an approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
Back to Reality for Imitation Learning
TLDR
This paper proposes that the most appropriate evaluation metric for robot learning is not data efficiency, but time efficiency, which captures the real-world cost much more truthfully.
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
TLDR
This paper develops an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior, which formalizes the forward problem (as a normative standard), subsuming common classes of control behavior.
Learning Embodied Agents with Scalably-Supervised Reinforcement Learning
  • Lisa Lee
  • 2021
Reinforcement learning (RL) agents learn to perform a task through trial-and-error interactions with an initially unknown environment. Despite the recent progress in deep RL, it remains a challenge
SS-MAIL: Self-Supervised Multi-Agent Imitation Learning
TLDR
The SS-MAIL framework improves multi-agent imitation capabilities by stabilizing the policy training, improving the reward shaping capabilities, as well as providing the ability for modeling multi-modal trajectories.

References

SHOWING 1-10 OF 46 REFERENCES
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
TLDR
This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
TLDR
This work explores how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems and an efficient sample-based approximation for MaxEnt IOC.
Energy-Based Imitation Learning
TLDR
A novel IL framework named Energy-Based Imitation Learning (EBIL) is proposed, solving the IL problem via estimating the expert energy as the surrogate reward function through score matching, which enjoys high model flexibility for expert policy distribution estimation and efficient computation that avoids the previous alternate training fashion.
A Divergence Minimization Perspective on Imitation Learning Methods
TLDR
A unified probabilistic perspective on IL algorithms based on divergence minimization is presented, conclusively identifying that IRL's state-marginal matching objective contributes most to its superior performance, and applies the new understanding of IL methods to the problem of state-Marginal matching.
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
TLDR
This work proposes a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows to re-frame imitation learning within the standard reinforcement learning setting.
Adversarial Imitation via Variational Inverse Reinforcement Learning
TLDR
The results show that the proposed empowerment-regularized maximum-entropy inverse reinforcement learning method not only learns near-optimal rewards and policies that are matching expert behavior but also performs significantly better than state-of-the-art inverse reinforcementlearning algorithms.
Imitation Learning as f-Divergence Minimization
TLDR
This work proposes a general imitation learning framework for estimating and minimizing any f-Divergence, and shows that the approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning.
Disagreement-Regularized Imitation Learning
TLDR
The algorithm operates by training an ensemble of policies on the expert demonstration data, and using the variance of their predictions as a cost which is minimized with RL together with a supervised behavioral cloning cost, and uses a fixed reward function which is easy to optimize.
Efficient Reductions for Imitation Learning
TLDR
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
TLDR
It is demonstrated that AIRL is able to recover reward functions that are robust to changes in dynamics, enabling us to learn policies even under significant variation in the environment seen during training.
...
1
2
3
4
5
...