# Relative Entropy Inverse Reinforcement Learning

@inproceedings{Boularias2011RelativeEI, title={Relative Entropy Inverse Reinforcement Learning}, author={Abdeslam Boularias and J. Kober and Jan Peters}, booktitle={AISTATS}, year={2011} }

We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an ecient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)optimal policy can be computed for dierent reward functions. However, this requirement can hardly be satised in… Expand

#### 258 Citations

Deep Inverse Q-learning with Constraints

- Computer Science, Mathematics
- NeurIPS
- 2020

This work introduces a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy, and proposes Inverse Action-value Iteration which is able to fully recover an underlying reward of an external agent in closed-form analytically. Expand

Structured Classification for Inverse Reinforcement Learning

- Computer Science
- EWRL 2012
- 2012

This paper addresses the Inverse Reinforcement Learning (IRL) problem which is a particular case of learning from demonstrations using a classi cation approach in which the structure of the underlying Markov Decision Process is implicitly injected and ends up with an e cient subgradient descent-based algorithm. Expand

On the Performance of Maximum Likelihood Inverse Reinforcement Learning

- Computer Science
- ArXiv
- 2012

A detailed description of the different methods of inverse reinforcement learning to highlight differences in terms of reward estimation, policy similarity and computational costs and experimental results are provided to evaluate the differences in performance. Expand

On the Minimization of the Policy Gradient in Inverse Reinforcement Learning

- 2015

Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert’s policy. Most IRL algorithms need to… Expand

Learning from a Learner

- Computer Science
- ICML
- 2019

A novel setting for Inverse Reinforcement Learning (IRL), namely “Learning from a Learner” (LfL), which does not consist in learning a reward by observing an optimal agent, but from observations of another learning (and thus suboptimal) agent. Expand

Inverse Reinforcement Learning through Policy Gradient Minimization

- Computer Science
- AAAI
- 2016

This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem and presents an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator. Expand

A review of inverse reinforcement learning theory and recent advances

- Computer Science
- 2012 IEEE Congress on Evolutionary Computation
- 2012

IRL introduces a new way of learning policies by deriving expert's intentions, in contrast to directly learning policies, which can be redundant and have poor generalization ability. Expand

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

- Mathematics, Computer Science
- AISTATS
- 2016

This work presents a gradient-based IRL approach that simultaneously estimates the system's dynamics and solves the combined optimization problem, which takes into account the bias of the demonstrations, which stems from the generating policy. Expand

A review of inverse reinforcement learning theory and recent advances

- Computer Science
- IEEE Congress on Evolutionary Computation
- 2012

Inverse Reinforcement Learning (IRL), an extension of RL, introduces a new way of learning policies by deriving expert’s intentions, in contrast to directly learning policies, which can be redundant and have poor generalization ability. Expand

Apprenticeship learning with few examples

- Computer Science
- Neurocomputing
- 2013

The quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic, and two new approaches for bootstrapping the demonstrations are introduced. Expand

#### References

SHOWING 1-10 OF 23 REFERENCES

Maximum Entropy Inverse Reinforcement Learning

- Computer Science
- AAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand

Apprenticeship learning via inverse reinforcement learning

- Computer Science
- ICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand

Bayesian Inverse Reinforcement Learning

- Computer Science
- IJCAI
- 2007

This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions. Expand

Active Learning for Reward Estimation in Inverse Reinforcement Learning

- Computer Science
- ECML/PKDD
- 2009

An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. Expand

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

- Computer Science, Mathematics
- UAI
- 2007

A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed. Expand

Maximum margin planning

- Computer Science
- ICML
- 2006

This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Expand

Relative Entropy Policy Search

- Computer Science
- AAAI
- 2010

The Relative Entropy Policy Search (REPS) method is suggested, which differs significantly from previous policy gradient approaches and yields an exact update step and works well on typical reinforcement learning benchmark problems. Expand

A Game-Theoretic Approach to Apprenticeship Learning

- Computer Science
- NIPS
- 2007

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's. Expand

Learning to search: Functional gradient techniques for imitation learning

- Computer Science
- Auton. Robots
- 2009

The work presented extends the Maximum Margin Planning (MMP) framework to admit learning of more powerful, non-linear cost functions, and demonstrates practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. Expand

Robot Learning From Demonstration

- Computer Science
- ICML
- 1997

This work has shown that incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. Expand