• Corpus ID: 49869551

Backplay: "Man muss immer umkehren"

@article{Resnick2018BackplayM,
  title={Backplay: "Man muss immer umkehren"},
  author={Cinjon Resnick and Roberta Raileanu and Sanyam Kapoor and Alex Peysakhovich and Kyunghyun Cho and Joan Bruna},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.06919}
}
Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point… 

Learning Montezuma's Revenge from a Single Demonstration

A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting, providing insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

This paper proposes to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration and presents a new algorithm called DCIL-II, which learns a goal-conditioned policy to control a system between successive low-dimensional goals.

PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

This paper proposes a new algorithm that combines motion planning and reinforcement learning to solve hard exploration environments, and shows that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.

Jump-Start Reinforcement Learning

This paper proposes Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy that is compatible with any RL approach and able to outperform existing imitation and reinforcement learning algorithms, particularly in the small-data regime.

A Performance-Based Start State Curriculum Framework for Reinforcement Learning

This work proposes a unifying framework for performance-based start state curricula in RL, which allows to analyze and compare the performance influence of the two key components: performance measure estimation and a start selection policy.

Divide & Conquer Imitation Learning

A novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory is presented, based on a sequential inductive bias, that scales to a complex simulated robotic manipulation task with very high sample efficiency.

Competitive Experience Replay

This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration.

Go-Explore: a New Approach for Hard-Exploration Problems

A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.

Continual Match Based Training in Pommerman: Technical Report

This work proposes a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic agents in Pommerman, a partially observable multi-agent environment with no communication, and trains an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommer man Competition.

References

SHOWING 1-10 OF 73 REFERENCES

Learning Montezuma's Revenge from a Single Demonstration

A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.

Overcoming Exploration in Reinforcement Learning with Demonstrations

This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.

Kickstarting Deep Reinforcement Learning

It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.

Playing hard exploration games by watching YouTube

A two-stage method of one-shot imitation that allows an agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

The Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance.

Reverse Curriculum Generation for Reinforcement Learning

This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

This work proposes a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent and trains end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities.

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

This work shows that well-known reinforcement learning methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals.

Reinforcement Learning from Imperfect Demonstrations

This work proposes a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing theQ-values of actions unseen in the demonstration data, making NAC robust to suboptimal demonstration data.
...