• Corpus ID: 49869551

Backplay: "Man muss immer umkehren"

@article{Resnick2018BackplayM,
  title={Backplay: "Man muss immer umkehren"},
  author={Cinjon Resnick and Roberta Raileanu and Sanyam Kapoor and Alex Peysakhovich and Kyunghyun Cho and Joan Bruna},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.06919}
}
Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point… 
Learning Montezuma's Revenge from a Single Demonstration
TLDR
A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.
Accelerating Training in Pommerman with Imitation and Reinforcement Learning
TLDR
The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting, providing insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.
PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
TLDR
This paper proposes a new algorithm that combines motion planning and reinforcement learning to solve hard exploration environments, and shows that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.
A Performance-Based Start State Curriculum Framework for Reinforcement Learning
TLDR
This work proposes a unifying framework for performance-based start state curricula in RL, which allows to analyze and compare the performance influence of the two key components: performance measure estimation and a start selection policy.
Competitive Experience Replay
TLDR
This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration.
Go-Explore: a New Approach for Hard-Exploration Problems
TLDR
A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Continual Match Based Training in Pommerman: Technical Report
TLDR
This work proposes a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic agents in Pommerman, a partially observable multi-agent environment with no communication, and trains an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommer man Competition.
C OMPETITIVE EXPERIENCE REPLAY
Deep learning has achieved remarkable successes in solving challenging reinforcement learning (RL) problems. However, it still often suffers from the need to engineer a reward function that not only
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
TLDR
This paper exemplifies and analyzes the high rate of catastrophic events that happen under random exploration in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman, and proposes a new framework where even a non-expert simulated demonstrator can be integrated to asynchronous distributed deep reinforcement learning methods.
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
TLDR
This paper contributes a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks, to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy.
...
1
2
3
4
...

References

SHOWING 1-10 OF 84 REFERENCES
Learning Montezuma's Revenge from a Single Demonstration
TLDR
A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Kickstarting Deep Reinforcement Learning
TLDR
It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.
Playing hard exploration games by watching YouTube
TLDR
A two-stage method of one-shot imitation that allows an agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.
Forward-Backward Reinforcement Learning
TLDR
This work proposes training a model to learn to take imagined reversal steps from known goal states and empirically demonstrates that it yields better performance than standard DDQN.
BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning
TLDR
The Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance.
Reverse Curriculum Generation for Reinforcement Learning
TLDR
This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
TLDR
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
TLDR
This work proposes a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent and trains end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities.
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
TLDR
This work poses a cooperative ‘image guessing’ game between two agents who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images and shows the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision.
...
1
2
3
4
5
...