Adaptive Information Gathering via Imitation Learning

@article{Choudhury2017AdaptiveIG,
  title={Adaptive Information Gathering via Imitation Learning},
  author={Sanjiban Choudhury and Ashish Kapoor and Gireeja Ranade and Sebastian A. Scherer and Debadeepta Dey},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.07834}
}
In the adaptive information gathering problem, a policy is required to select an informative sensing location using the history of measurements acquired thus far. While there is an extensive amount of prior work investigating effective practical approximations using variants of Shannon’s entropy, the efficacy of such policies heavily depends on the geometric distribution of objects in the world. On the other hand, the principled approach of employing online POMDP solvers is rendered impractical… 

Figures and Tables from this paper

Improving Imitation Learning through Efficient Expert Querying

TLDR
This work proposes a modified version of the DAgger algorithm aimed at reducing expert queries while maintaining learner performance, and implements several supervised active learning approaches as part of the query selection, allowing policy uncertainty to inform expert label queries.

Learning Heuristic Search via Imitation

TLDR
SaIL is presented, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort and is validated on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms.

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

TLDR
This work presents two gradient procedures that can learn neural network policies for several problems, including a sequential prediction task and several high-dimensional robotics control problems and provides a comprehensive theoretical study of IL.

Data-driven planning via imitation learning

TLDR
A novel data-driven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle: an oracle that at train time has full knowledge about the world map and can compute optimal decisions.

Towards Generalization and Efficiency of Reinforcement Learning

TLDR
This work proposes a general framework, named Dual Policy Iteration (DPI), which maintains two policies, an apprentice policy and an expert policy, and alternatively updates the apprentice policy via imitating the expert policy while updates the expertpolicy via imitation.

Graph Neural Networks for Decentralized Multi-Robot Submodular Action Selection

TLDR
A general-purpose learning architecture towards submodular maximization at scale, with decentralized communications that leverages a graph neural network (GNN) to capture local interactions of the robots and learns decentralized decision-making for the robots.

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

TLDR
The problem of decomposing objects into segments as a parsing approach is explored, and it is made the insight that the derivation of a parse-tree that decomposes the object into segments closely resembles a decision tree constructed by ID3, which can be done when the ground-truth available.

MPC-Net: A First Principles Guided Policy Search

TLDR
An Imitation Learning approach for the control of dynamical systems with a known model, which trains a mixture-of-expert neural network architecture for controlling a quadrupedal robot and shows that this policy structure is well suited for such multimodal systems.

Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences

TLDR
The task of bin picking, where multiple objects are randomly arranged in a heap and the objective is to sequentially grasp and transport each into a packing box, is considered, with a discrete-time Partially Observable Markov Decision Process that specifies states of the heap, point cloud observations, and rewards.

Adaptive Motion Planning

TLDR
The approach leads to the synthesis of a robust real-time planning module that allows a UAV to navigate seamlessly across environments and speed-regimes and establishes novel connections between the disparate fields of motion planning and active learning, imitation learning and online paging which opens doors to several new research problems.

References

SHOWING 1-10 OF 42 REFERENCES

Learning to gather information via imitation

TLDR
This paper presents an efficient algorithm, EXPLORE, that trains a policy on the target distribution to imitate a clairvoyant oracle — an oracle that has full information about the world and computes non-myopic solutions to maximize information gathered.

Near-Optimal Bayesian Active Learning with Noisy Observations

TLDR
EC2 is developed, a novel, greedy active learning algorithm and it is proved that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations.

Efficient touch based localization through submodularity

TLDR
This work develops new methods based on adaptive submodularity for selecting a sequence of information gathering actions online by drawing an explicit connection to submodular, and demonstrates the effectiveness of these methods in simulation and on a robot.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

TLDR
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

Near-optimal Bayesian Active Learning with Correlated and Noisy Tests

TLDR
This paper proposes ECED, a novel, computationally efficient active learning algorithm, and proves strong theoretical guarantees that hold with correlated, noisy tests, and demonstrates strong empirical performance of ECED on two problem instances, including a Bayesian experimental design task intended to distinguish among economic theories of how people make risky decisions, and an active preference learning task via pairwise comparisons.

Submodular Surrogates for Value of Information

TLDR
The utility of the DiRECt approach is demonstrated on four diverse case-studies: touch-based robotic localization, comparison-based preference learning, wild-life conservation management, and preference elicitation in behavioral economics.

Efficient Reductions for Imitation Learning

TLDR
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.

DESPOT: Online POMDP Planning with Regularization

TLDR
This paper presents an online POMDP algorithm that alleviates these difficulties by focusing the search on a set of randomly sampled scenarios, and gives an output-sensitive performance bound for all policies derived from a DESPOT, and shows that R-DESPOT works well if a small optimal policy exists.

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

TLDR
This work presents two gradient procedures that can learn neural network policies for several problems, including a sequential prediction task and several high-dimensional robotics control problems and provides a comprehensive theoretical study of IL.

Nonmyopic Adaptive Informative Path Planning for Multiple Robots

TLDR
This paper presents a novel approach to adaptive informative path planning that plans ahead for possible observations that can be made in the future, and develops an algorithm that performs provably near-optimally in settings where the adaptivity gap is small.