Corpus ID: 18476493

Inverse Reinforcement Learning with PI 2

  title={Inverse Reinforcement Learning with PI 2},
  author={Mrinal Kalakrishnan and Evangelos A. Theodorou and Stefan Schaal},
We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination of features, and we are able to learn weights that result in a cost function under which the expert demonstrated trajectories are optimal. Unlike previous approaches [1], [2], our algorithm does not require repeated solving of the forward problem (i.e., finding optimal trajectories under a candidate cost… Expand

Figures from this paper

Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals
  • N. Aghasadeghi, T. Bretl
  • Mathematics, Computer Science
  • 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2011
In this paper, we consider the problem of inverse reinforcement learning for a particular class of continuous-time stochastic systems with continuous state and action spaces, under the assumptionExpand
Learning objective functions for manipulation
An approach to learning objective functions for robotic manipulation based on inverse reinforcement learning that can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories is presented. Expand
A review of inverse reinforcement learning theory and recent advances
Inverse Reinforcement Learning (IRL), an extension of RL, introduces a new way of learning policies by deriving expert’s intentions, in contrast to directly learning policies, which can be redundant and have poor generalization ability. Expand
A survey of inverse reinforcement learning techniques
The original IRL algorithms and its close variants, as well as their recent advances are reviewed and compared. Expand
Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
This paper develops a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise, and introduces a novel latent variable characterizing the reliability of each expert action and uses Laplace distribution as its prior. Expand
CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem
Experimental results on standard benchmarks such as objectworld and pendulum show that the proposed algorithm can effectively learn the latent reward function in complex, high-dimensional environments. Expand
Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing aExpand
Learning How Pedestrians Navigate: A Deep Inverse Reinforcement Learning Approach
  • M. Fahad, Zhuo Chen, Yi Guo
  • Computer Science
  • 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2018
The evaluation results show that the proposed method has acceptable prediction accuracy compared to other state-of-the-art methods, and it can generate pedestrian trajectories similar to real human trajectories with natural social navigation behaviors such as collision avoidance, leader-follower, and split-and-rejoin. Expand
Reinforcement Learning: Recent Threads
Detailed comparisons and discussion of six reinforcement algorithms, their exploration and exploitation strategy, their weakness and strengths are presented in the paper. Expand


Reinforcement learning of motor skills in high dimensions: A path integral approach
This paper derives a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals, and believes that this new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics. Expand
Apprenticeship learning via inverse reinforcement learning
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
Learning to search: Functional gradient techniques for imitation learning
The work presented extends the Maximum Margin Planning (MMP) framework to admit learning of more powerful, non-linear cost functions, and demonstrates practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. Expand
Learning Attractor Landscapes for Learning Motor Primitives
By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. Expand
Reinforcement learning of motor skills with policy gradients
This paper examines learning of complex motor skills with human-like limbs, and combines the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning with the theory of stochastic policy gradient learning. Expand