# Apprenticeship learning via inverse reinforcement learning

@article{Abbeel2004ApprenticeshipLV, title={Apprenticeship learning via inverse reinforcement learning}, author={P. Abbeel and A. Ng}, journal={Proceedings of the twenty-first international conference on Machine learning}, year={2004} }

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a…

## 2,423 Citations

Stochastic convex optimization for provably efficient apprenticeship learning

- Computer Science, MathematicsArXiv
- 2022

A computationally efficient algorithm is developed and high confidence regret bounds are derived on the quality of the extracted policy, utilizing results from stochastic convex optimization and recent works in approximate linear programming for solving forward MDPs.

Exploration and apprenticeship learning in reinforcement learning

- Computer ScienceICML
- 2005

This paper considers the apprenticeship learning setting in which a teacher demonstration of the task is available, and shows that, given the initial demonstration, no explicit exploration is necessary, and the student can attain near-optimal performance simply by repeatedly executing "exploitation policies" that try to maximize rewards.

Apprenticeship learning via soft local homomorphisms

- Computer Science2010 IEEE International Conference on Robotics and Automation
- 2010

This paper proposes to use a transfer method, known as soft homomorphism, in order to generalize the expert's policy to unvisited regions of the state space, which can be used either as the robot's final policy, or to calculate the features frequencies within an IRL algorithm.

Bootstrapping Apprenticeship Learning

- Computer ScienceNIPS
- 2010

The quality of the learned policies is highly sensitive to the error in estimating the feature counts, and a novel approach is introduced for bootstrapping the demonstration by assuming that the expert is (near-)optimal, and the dynamics of the system is known.

Apprenticeship learning with few examples

- Computer ScienceNeurocomputing
- 2013

The quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic, and two new approaches for bootstrapping the demonstrations are introduced.

Compatible Reward Inverse Reinforcement Learning

- Computer ScienceNIPS
- 2017

A novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function.

Inverse Reinforcement Learning via Matching of Optimality Profiles

- Computer ScienceArXiv
- 2020

This work proposes an algorithm that learns a reward function from demonstrations together with a weak supervision signal in the form of a distribution over rewards collected during the demonstrations, and shows that the method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.

Inverse Reinforcement Learning from a Gradient-based Learner

- Computer Science, MathematicsNeurIPS
- 2020

This paper proposes a new algorithm for inverse Reinforcement Learning, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning.

Inverse Reinforcement Learning with Multiple Ranked Experts

- Computer Science, MathematicsArXiv
- 2019

This work considers the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead the authors have access to a set of demonstrators of varying performance, and uses ideas from ordinal regression to find a rewarded function that maximizes the margin between the different ranks.

Relative Entropy Inverse Reinforcement Learning

- Computer ScienceAISTATS
- 2011

This paper proposes a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent.

## References

SHOWING 1-10 OF 20 REFERENCES

Robot Learning From Demonstration

- Computer ScienceICML
- 1997

This work has shown that incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning.

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

- Computer ScienceICML
- 1999

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.

Formation and control of optimal trajectory in human multijoint arm movement

- Engineering, MathematicsBiological Cybernetics
- 2004

The idea that the human hand trajectory is planned and controlled in accordance with the minimum torquechange criterion is supported by developing an iterative scheme, with which the optimal trajectory and the associated motor command are simultaneously computed.

Learning movement sequences from demonstration

- Computer ScienceProceedings 2nd International Conference on Development and Learning. ICDL 2002
- 2002

Presents a control and learning architecture for humanoid robots designed for acquiring movement skills in the context of imitation learning, and uses the notion of visuo-motor primitives, modules capable of recognizing as well as executing similar movements.

Linear Programming and Sequential Decisions

- Mathematics
- 1960

Using an illustration drawn from the area of inventory control, this paper demonstrates how a typical sequential probabilistic model may be formulated in terms of a an initial decision rule and b a…

Algorithms for Inverse Reinforcement Learning

- Chemistry, Computer ScienceICML
- 2000

Pharmacokinetics of ivermectin after IV administration were best described by a 2-compartment open model; values for main compartmental variables included volume of distribution at a steady state, area under the plasma concentration-time curve, and area underThe AUC curve.

An organizing principle for a class of voluntary movements

- Mathematics, MedicineThe Journal of neuroscience : the official journal of the Society for Neuroscience
- 1984

This paper presents a mathematical model which predicts both the major qualitative features and, within experimental error, the quantitative details of a class of perturbed and unperturbed…

Learning by watching: extracting reusable task knowledge from visual observation of human performance

- Computer ScienceIEEE Trans. Robotics Autom.
- 1994

A novel task instruction method for future intelligent robots that learns reusable task plans by watching a human perform assembly tasks is presented, which results in a hierarchical task plan describing the higher level structure of the task.

Statistical learning theory

- Computer Science
- 1998

Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

ALVINN: An Autonomous Land Vehicle in a Neural Network

- Engineering, Computer ScienceNIPS
- 1988

ALVINN (Autonomous Land Vehicle In a Neural Network) is a 3-layer back-propagation network designed for the task of road following that can effectively follow real roads under certain field conditions.