Apprenticeship learning via inverse reinforcement learning


We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's <i>unknown</i> reward function.

DOI: 10.1145/1015330.1015430

Extracted Key Phrases

Unfortunately, ACM prohibits us from displaying non-influential references for this paper.

To see the full reference list, please visit

Showing 1-10 of 635 extracted citations
Citations per Year

1,038 Citations

Semantic Scholar estimates that this publication has received between 912 and 1,184 citations based on the available data.

See our FAQ for additional information.