• Corpus ID: 220713273

# Bridging the Imitation Gap by Adaptive Insubordination

@inproceedings{Weihs2021BridgingTI,
title={Bridging the Imitation Gap by Adaptive Insubordination},
author={Luca Weihs and Unnat Jain and Jordi Salvador and Svetlana Lazebnik and Aniruddha Kembhavi and Alexander G. Schwing},
booktitle={NeurIPS},
year={2021}
}
• Published in NeurIPS 23 July 2020
• Computer Science
Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and…
6 Citations

## Figures and Tables from this paper

AllenAct: A Framework for Embodied AI Research
• Computer Science
ArXiv
• 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V
• Computer Science
ECCV
• 2020
BLSM first sets bone lengths and joint angles to specify the skeleton, then specifies identity-specific surface variation, and finally bundles them together through linear blend skinning, allowing for out-of-box integration with standard graphics packages like Unity, facilitating full-body AR effects and image-driven character animation.
GridToPix: Training Embodied Agents with Minimal Supervision
• Computer Science
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
• 2021
GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.
Robust Asymmetric Learning in POMDPs
• Computer Science
ICML
• 2021
A computationally efficient algorithm, adaptive asymmetric DAgger (A2D), is constructed that allows the trainee to safely imitate the modified expert, and outperforms policies learned either by imitating a fixed expert or direct reinforcement learning.
Simple but Effective: CLIP Embeddings for Embodied AI
• Computer Science
ArXiv
• 2021
One of the baselines is extended, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training, and it beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge.
Supplementary Materials: Robust Asymmetric Learning in POMDPs
• Psychology
• 2021
t Time Discrete time step Z Discrete time step used in integration. Indexes other values. st State Full state, compact S = R State space of the MDP. Sufficient to fully define state of the

## References

SHOWING 1-10 OF 83 REFERENCES
Hierarchical Imitation and Reinforcement Learning
• Computer Science
ICML
• 2018
This work proposes an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction and can incorporate different combinations of imitation learning and reinforcement learning at different levels, leading to dramatic reductions in both expert effort and cost of exploration.
• Computer Science
NIPS
• 2016
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
Efficient Reductions for Imitation Learning
• Computer Science
AISTATS
• 2010
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.
Overcoming Exploration in Reinforcement Learning with Demonstrations
• Computer Science
2018 IEEE International Conference on Robotics and Automation (ICRA)
• 2018
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Reinforcement Learning with Unsupervised Auxiliary Tasks
• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
State-only Imitation with Transition Dynamics Mismatch
• Computer Science
ICLR
• 2020
This paper presents a new state-only IL algorithm that divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subpro problems iteratively and shows that it is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs.
Policy Optimization with Demonstrations
• Computer Science
ICML
• 2018
It is shown that POfD induces implicit dynamic reward shaping and brings provable benefits for policy improvement, and can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
• Computer Science
AISTATS
• 2011
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Continuous control with deep reinforcement learning
• Computer Science
ICLR
• 2016
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.