Learning Navigation Costs from Demonstration in Partially Observable Environments

  title={Learning Navigation Costs from Demonstration in Partially Observable Environments},
  author={Tianyu Wang and Vikas Dhiman and Nikolay A. Atanasov},
  journal={2020 IEEE International Conference on Robotics and Automation (ICRA)},
This paper focuses on inverse reinforcement learning (IRL) to enable safe and efficient autonomous navigation in unknown partially observable environments. The objective is to infer a cost function that explains expert-demonstrated navigation behavior while relying only on the observations and state-control trajectory used by the expert. We develop a cost function representation composed of two parts: a probabilistic occupancy encoder, with recurrent dependence on the observation sequence, and… 

Figures and Tables from this paper

Learning Navigation Costs from Demonstration with Semantic Observations

The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory, and the approach develops a map encoder and a cost encoder, defined as deep neural network over the semantic features.

Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning

A new model of expert behavior is proposed that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm and allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories.

Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata

A spectral learning approach is employed to extract a weighted weighted automaton, approximating the unknown task logic, capable of generalizing the execution of the inferred task specification in a suite of MiniGrid environments.

Modeling Human Behavior Part I - Learning and Belief Approaches

The main objective of this paper is to provide a succinct yet systematic review of the most important approaches in two areas dealing with quantitative models of human behaviors, and to directly model mechanisms of human reasoning, such as beliefs and bias, without going necessarily learning via trial-and-error.



Inverse Reinforcement Learning in Partially Observable Environments

This paper presents IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP) and deals with two cases according to the representation of the given expert's behavior.

Cognitive Mapping and Planning for Visual Navigation

The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world.

SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces

This work has developed a new point-based POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve computational efficiency and substantially outperformed one of the fastest existing point- based algorithms.

Reinforcement Learning via Recurrent Convolutional Neural Networks

This work presents a more natural representation of the solutions to Reinforcement Learning (RL) problems, within 3 Recurrent Convolutional Neural Network (RCNN) architectures to better exploit this inherent structure.

QMDP-Net: Deep Learning for Planning under Partial Observability

While QMDP-net encodes theQMDP algorithm, it sometimes outperforms the QM DP algorithm in the experiments, as a result of end-to-end learning.

Memory Augmented Control Networks

It is shown that the Memory Augmented Control Network learns to plan and can generalize to new environments and is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles.

Watch this: Scalable cost-function learning for path planning in urban environments

This work deploys a Maximum Entropy based, non-linear IRL framework which uses Fully Convolutional Neural Networks (FCNs) to represent the cost model underlying expert driving behaviour and demonstrates scalability and performance on an ambitious dataset collected over the course of one year.

Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Path Integral Networks: End-to-End Differentiable Optimal Control

Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem.