- Published 2007 in UAI

In this paper we propose a novel gradient algorithm to learn a policy from an expert’s observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm’s aim is to find a reward function such that the resulting optimal policy matches well the expert’s observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is overcome by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.

Citations per Year

Semantic Scholar estimates that this publication has **166** citations based on the available data.

See our **FAQ** for additional information.

Showing 1-10 of 102 extracted citations

Highly Influenced

20 Excerpts

Highly Influenced

4 Excerpts

Highly Influenced

4 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

4 Excerpts

Highly Influenced

11 Excerpts

Highly Influenced

9 Excerpts

Highly Influenced

4 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

5 Excerpts

@inproceedings{Neu2007ApprenticeshipLU,
title={Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods},
author={Gergely Neu and Csaba Szepesv{\'a}ri},
booktitle={UAI},
year={2007}
}