Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning

  title={Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning},
  author={Marco Pleines and Jenia Jitsev and Mike Preuss and Frank Zimmer},
  journal={2020 IEEE Conference on Games (CoG)},
The Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete. Whereas the most top performing entries of last year’s competition used human demonstrations or reward shaping to learn how to cope with the challenge, we present an approach that performed competitively (placed 7th) but starts completely from scratch by means of Deep Reinforcement Learning with a relatively simple feed-forward deep network structure. We… 

Figures and Tables from this paper

Learning sparse and meaningful representations through embodiment


and O
  • Klimov, “Proximal policy optimization algorithms,” arXiv:1707.06347
  • 2017
Dota 2 with Large Scale Deep Reinforcement Learning
By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
Leveraging Procedural Generation to Benchmark Reinforcement Learning
This work empirically demonstrate that diverse environment distributions are essential to adequately train and evaluate RL agents, thereby motivating the extensive use of procedural content generation and uses this benchmark to investigate the effects of scaling model size.
Grandmaster level in StarCraft II using multi-agent reinforcement learning
The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
A modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL), which is able to train on millions of frames per second and lower the cost of experiments compared to current methods with a simple architecture.
Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
A simple technique to improve a generalization ability of deep RL agents by introducing a randomized (convolutional) neural network that randomly perturbs input observations, which enables trained agents to adapt to new domains by learning robust features invariant across varied and randomized environments.
Generalization of Reinforcement Learners with Working and Episodic Memory
This paper develops a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that are relevant for evaluating memory-specific generalization.
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
This work proposes Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality, and demonstrates that the Information Bottleneck is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents.
Layer-Wise Relevance Propagation: An Overview
This chapter gives a concise introduction to LRP with a discussion of how to implement propagation rules easily and efficiently, how the propagation procedure can be theoretically justified as a ‘deep Taylor decomposition’, how to choose the propagation rules at each layer to deliver high explanation quality, and how LRP can be extended to handle a variety of machine learning scenarios beyond deep neural networks.
Emergent Tool Use From Multi-Agent Autocurricula
This work finds clear evidence of six emergent phases in agent strategy in the authors' environment, each of which creates a new pressure for the opposing team to adapt, and compares hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.