• Corpus ID: 227054172

Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

@article{Xie2020GeneralizedIP,
  title={Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation},
  author={Sirui Xie and Feng Gao and Song-Chun Zhu},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.09854}
}
In searching for a generalizable representation of temporally extended tasks, we spot two necessary constituents: the utility needs to be non-Markovian to transfer temporal relations invariant to a probability shift, the utility also needs to be lifted to abstract out specific grounding objects. In this work, we study learning such utility from human demonstrations. While inverse reinforcement learning (IRL) has been accepted as a general framework of utility learning, its fundamental… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 47 REFERENCES
Deep reinforcement learning with relational inductive biases
TLDR
The main contribution of this work is to introduce techniques for representing and reasoning about states in model-free deep reinforcement learning agents via relational inductive biases, which can offer advantages in efficiency, generalization, and interpretability.
Inverse Reinforcement Learning in Relational Domains
TLDR
This work introduces the first approach to the Inverse Reinforcement Learning (IRL) problem in relational domains and shows how different formalisms allow one to create a new IRL algorithm for relational domains that can recover with great efficiency rewards from expert data that have strong generalization and transfer properties.
Learning Task Specifications from Demonstrations
TLDR
The specification inference task is formulated as a maximum a posteriori (MAP) probability inference problem, the principle of maximum entropy is applied to derive an analytic demonstration likelihood model and an efficient approach to search for the most likely specification in a large candidate pool of specifications is given.
Reinforcement learning with temporal logic rewards
  • Xiao Li, C. Vasile, C. Belta
  • Computer Science
    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2017
TLDR
It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.
Learning Generalized Policies from Planning Examples Using Concept Languages
TLDR
This paper follows Khardon's approach but represents generalized policies in a different way using a concept language and shows that the concept language yields a better policy using a smaller set of examples and no background knowledge.
Relative Entropy Inverse Reinforcement Learning
TLDR
This paper proposes a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent.
Few-Shot Bayesian Imitation Learning with Logical Program Policies
TLDR
This work proposes an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples, and argues that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Maximum margin planning
TLDR
This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference.
...
...