Corpus ID: 9640442

Relative Entropy Inverse Reinforcement Learning

@inproceedings{Boularias2011RelativeEI,
  title={Relative Entropy Inverse Reinforcement Learning},
  author={Abdeslam Boularias and J. Kober and Jan Peters},
  booktitle={AISTATS},
  year={2011}
}
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an ecient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)optimal policy can be computed for dierent reward functions. However, this requirement can hardly be satised in… Expand
Deep Inverse Q-learning with Constraints
TLDR
This work introduces a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy, and proposes Inverse Action-value Iteration which is able to fully recover an underlying reward of an external agent in closed-form analytically. Expand
Structured Classification for Inverse Reinforcement Learning
TLDR
This paper addresses the Inverse Reinforcement Learning (IRL) problem which is a particular case of learning from demonstrations using a classi cation approach in which the structure of the underlying Markov Decision Process is implicitly injected and ends up with an e cient subgradient descent-based algorithm. Expand
On the Performance of Maximum Likelihood Inverse Reinforcement Learning
TLDR
A detailed description of the different methods of inverse reinforcement learning to highlight differences in terms of reward estimation, policy similarity and computational costs and experimental results are provided to evaluate the differences in performance. Expand
On the Minimization of the Policy Gradient in Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert’s policy. Most IRL algorithms need toExpand
Learning from a Learner
TLDR
A novel setting for Inverse Reinforcement Learning (IRL), namely “Learning from a Learner” (LfL), which does not consist in learning a reward by observing an optimal agent, but from observations of another learning (and thus suboptimal) agent. Expand
Inverse Reinforcement Learning through Policy Gradient Minimization
TLDR
This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem and presents an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator. Expand
A review of inverse reinforcement learning theory and recent advances
  • Shao Zhifei, E. Joo
  • Computer Science
  • 2012 IEEE Congress on Evolutionary Computation
  • 2012
TLDR
IRL introduces a new way of learning policies by deriving expert's intentions, in contrast to directly learning policies, which can be redundant and have poor generalization ability. Expand
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
TLDR
This work presents a gradient-based IRL approach that simultaneously estimates the system's dynamics and solves the combined optimization problem, which takes into account the bias of the demonstrations, which stems from the generating policy. Expand
A review of inverse reinforcement learning theory and recent advances
TLDR
Inverse Reinforcement Learning (IRL), an extension of RL, introduces a new way of learning policies by deriving expert’s intentions, in contrast to directly learning policies, which can be redundant and have poor generalization ability. Expand
Apprenticeship learning with few examples
TLDR
The quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic, and two new approaches for bootstrapping the demonstrations are introduced. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Maximum Entropy Inverse Reinforcement Learning
TLDR
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
Bayesian Inverse Reinforcement Learning
TLDR
This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions. Expand
Active Learning for Reward Estimation in Inverse Reinforcement Learning
TLDR
An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. Expand
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
TLDR
A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed. Expand
Maximum margin planning
TLDR
This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Expand
Relative Entropy Policy Search
TLDR
The Relative Entropy Policy Search (REPS) method is suggested, which differs significantly from previous policy gradient approaches and yields an exact update step and works well on typical reinforcement learning benchmark problems. Expand
A Game-Theoretic Approach to Apprenticeship Learning
TLDR
A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's. Expand
Learning to search: Functional gradient techniques for imitation learning
TLDR
The work presented extends the Maximum Margin Planning (MMP) framework to admit learning of more powerful, non-linear cost functions, and demonstrates practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. Expand
Robot Learning From Demonstration
TLDR
This work has shown that incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. Expand
...
1
2
3
...