# Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

@article{Xie2020GeneralizedIP, title={Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation}, author={Sirui Xie and Feng Gao and Song-Chun Zhu}, journal={ArXiv}, year={2020}, volume={abs/2011.09854} }

In searching for a generalizable representation of temporally extended tasks, we spot two necessary constituents: the utility needs to be non-Markovian to transfer temporal relations invariant to a probability shift, the utility also needs to be lifted to abstract out specific grounding objects. In this work, we study learning such utility from human demonstrations. While inverse reinforcement learning (IRL) has been accepted as a general framework of utility learning, its fundamental…

## Figures and Tables from this paper

## References

SHOWING 1-10 OF 47 REFERENCES

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

- Computer ScienceArtif. Intell.
- 1999

Deep reinforcement learning with relational inductive biases

- Computer ScienceICLR
- 2019

The main contribution of this work is to introduce techniques for representing and reasoning about states in model-free deep reinforcement learning agents via relational inductive biases, which can offer advantages in efficiency, generalization, and interpretability.

Inverse Reinforcement Learning in Relational Domains

- Computer ScienceIJCAI
- 2015

This work introduces the first approach to the Inverse Reinforcement Learning (IRL) problem in relational domains and shows how different formalisms allow one to create a new IRL algorithm for relational domains that can recover with great efficiency rewards from expert data that have strong generalization and transfer properties.

Learning Task Specifications from Demonstrations

- Computer ScienceNeurIPS
- 2018

The specification inference task is formulated as a maximum a posteriori (MAP) probability inference problem, the principle of maximum entropy is applied to derive an analytic demonstration likelihood model and an efficient approach to search for the most likely specification in a large candidate pool of specifications is given.

Reinforcement learning with temporal logic rewards

- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017

It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.

Learning Generalized Policies from Planning Examples Using Concept Languages

- Computer ScienceApplied Intelligence
- 2004

This paper follows Khardon's approach but represents generalized policies in a different way using a concept language and shows that the concept language yields a better policy using a smaller set of examples and no background knowledge.

Relative Entropy Inverse Reinforcement Learning

- Computer ScienceAISTATS
- 2011

This paper proposes a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent.

Few-Shot Bayesian Imitation Learning with Logical Program Policies

- Computer ScienceAAAI
- 2020

This work proposes an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples, and argues that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

Apprenticeship learning via inverse reinforcement learning

- Computer ScienceICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Maximum margin planning

- Computer ScienceICML
- 2006

This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference.