• Corpus ID: 54457643

ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents

  title={ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents},
  author={John Foley and Emma Tosch and Kaleigh Clary and David D. Jensen},
It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for deep learning research, and state-of-the-art Reinforcement Learning (RL) agents have been shown to… 

Figures from this paper

  • Computer Science
  • 2019
This work proposes a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent's learned state representation and is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments.
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
This work proposes a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation and rewards the agent substantially more for interacting with objects that it can control.
Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning
This work uses Atari games, a common benchmark for deep RL, to evaluate three types of saliency maps and introduces an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliencyMaps and assess the degree to which they correspond to the semantics of RL environments.
Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL
An empirical approach grounded in counterfactual reasoning is introduced to test the hypotheses generated from saliency maps and it is shown that explanations suggested bysaliency maps are often not supported by experiments.
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
AMIGo is proposed, a novel agent incorporating a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train aGoal-conditioned "student" policy in the absence of (or alongside) environment reward to solve challenging procedurally-generated tasks.
Evaluating the Performance of Reinforcement Learning Algorithms
This work argues that the inconsistency of performance stems from the use of flawed evaluation metrics, and proposes a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments.


The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.
Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games ?
This work considers a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer, and proposes their use as environments for evaluating and comparing different approaches to reinforcement learning.
Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments
This work makes the case for reporting post-training agent performance as a distribution, rather than a point estimate, and demonstrates the variability of common agents used in the popular OpenAI Baselines repository.
StarCraft II: A New Challenge for Reinforcement Learning
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
This paper takes a big picture look at how the ALE is being used by the research community and focuses on how diverse the evaluation methodologies in the ALE have become and highlights some key concerns when evaluating agents in this platform.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
A Brief Survey of Deep Reinforcement Learning
This survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning.
Reinforcement Learning: An Introduction
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Protecting against evaluation overfitting in empirical reinforcement learning
It is argued that reinforcement learning is particularly vulnerable to environment overfitting and generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution, are proposed as a remedy.
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research, is proposed and it is shown that a network with Leaky ReLU and Batch Normalization coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than $70\% of the time in the full game of Mini-RTS.