• Corpus ID: 220713273

Bridging the Imitation Gap by Adaptive Insubordination

  title={Bridging the Imitation Gap by Adaptive Insubordination},
  author={Luca Weihs and Unnat Jain and Jordi Salvador and Svetlana Lazebnik and Aniruddha Kembhavi and Alexander G. Schwing},
  booktitle={Neural Information Processing Systems},
Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and… 

AllenAct: A Framework for Embodied AI Research

AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V

BLSM first sets bone lengths and joint angles to specify the skeleton, then specifies identity-specific surface variation, and finally bundles them together through linear blend skinning, allowing for out-of-box integration with standard graphics packages like Unity, facilitating full-body AR effects and image-driven character animation.

A Survey of Meta-Reinforcement Learning

This survey describes the meta-RL problem setting in detail as well as its major variations and discusses howMeta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav

This work presents PIRLNav, a two-stage learning scheme for BC pretraining on human demonstrations followed by RL-finetuning, and investigates whether human demonstrations can be replaced with `free' sources of demonstrations, e.g. shortest paths or task-agnostic frontier exploration trajectories.

Last-Mile Embodied Visual Navigation

This work focuses on last-mile navigation and leverage the underlying geometric structure of the problem with neural descriptors and can easily connect SLING with heuristic, reinforcement learning, and neural modular policies.

Leveraging Fully Observable Policies for Learning under Partial Observability

This work proposes a method for partially observable reinforcement learning that uses a fully observable policy during offline training to improve online performance and outperforms pure imitation, pure reinforcement learning, the sequential or parallel combination of both types, and a recent state-of-the-art method in the same setting.

Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

This work proposes to learn a unified policy for whole- body control of a legged manipulator using reinforcement learning, and proposes Regularized Online Adaptation to bridge the Sim2Real gap for high-DoF control, and Advantage Mixing exploiting the causal dependency in the action space to overcome local minima during training the whole-body system.

Simple but Effective: CLIP Embeddings for Embodied AI

One of the baselines is extended, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training, and it beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge.

Supplementary Materials: Robust Asymmetric Learning in POMDPs

t Time Discrete time step Z Discrete time step used in integration. Indexes other values. st State Full state, compact S = R State space of the MDP. Sufficient to fully define state of the

GridToPix: Training Embodied Agents with Minimal Supervision

GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.



Hierarchical Imitation and Reinforcement Learning

This work proposes an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction and can incorporate different combinations of imitation learning and reinforcement learning at different levels, leading to dramatic reductions in both expert effort and cost of exploration.

Generative Adversarial Imitation Learning

A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.

Efficient Reductions for Imitation Learning

This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.

Overcoming Exploration in Reinforcement Learning with Demonstrations

This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.

Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

State-only Imitation with Transition Dynamics Mismatch

This paper presents a new state-only IL algorithm that divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subpro problems iteratively and shows that it is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs.

Policy Optimization with Demonstrations

It is shown that POfD induces implicit dynamic reward shaping and brings provable benefits for policy improvement, and can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.