• Corpus ID: 222133157

Mastering Atari with Discrete World Models

  title={Mastering Atari with Discrete World Models},
  author={Danijar Hafner and Timothy P. Lillicrap and Mohammad Norouzi and Jimmy Ba},
Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that… 

Figures and Tables from this paper

Discovering and Achieving Goals with World Models

  • Computer Science
  • 2021
The Explore Achieve Network (ExaNet), a unified solution to global exploration of the environment and learning to reliably reach situations found during exploration, is introduced by learning a world model from the high-dimensional images and using it to train an explorer and an achiever policy from imagined trajectories.

Discovering and Achieving Goals via World Models

The proposed agent, Latent Explorer Achiever (LEXA), addresses both challenges by learning a world model from image inputs and using it to train an explorer and an achiever policy via imagined rollouts, substantially outperforms previous approaches to unsupervised goal reaching.

Learning Generalizable Behavior via Visual Rewrite Rules

This paper proposes a novel representation and learning approach to capture environment dynamics without using neural networks, and presents preliminary results from a VRR agent that can explore, expand its rule set, and solve a game via planning with its learned VRR world model.

Cycle-Consistent World Models for Domain Independent Latent Imagination

A novel model-based reinforcement learning approach called Cycle-consistent World Models that can embed two modalities in a shared latent space and thereby learn from samples in one modality and be used for inference in different domain, which enables CCWM to outperform state-of-the-art domain adaptation approaches.

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

This work proposes to learn the prototypes from the recurrent states of the world model, thereby distilling temporal structures from past observations and actions into the prototypes, and develops a model that successfully combines DREAMER with prototypes, making large performance gains on the DeepMind Control suite when there are complex background distractions, while maintaining similar performance in the standard setting.

Discrete Latent Space World Models for Reinforcement Learning

A new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE) to encode observations and a convolutional LSTM to predict the next embedding indices is proposed.

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Fractional transfer learning is presented, the idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization, using the World Model-based Dreamer algorithm.

Dream to Explore: Adaptive Simulations for Autonomous Systems

This work tackles the problem of learning to control dynamical systems by applying Bayesian nonparametric methods, which is applied to solve visual servoing tasks by first learning a state space representation, then inferring environmental dynamics and improving the policies through imagined future trajectories.

World Model as a Graph: Learning Latent Landmarks for Planning

This work proposes to learn graph-structured world models composed of sparse, multi-step transitions and devise a novel algorithm to learn latent landmarks that are scattered across the goal space as the nodes on the graph, and believes this work is an important step towards scalable planning in reinforcement learning.

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

This work proposes a new approach to reinforcement learning called Theory-Based Reinforcement Learning, which uses human-like intuitive theories — rich, abstract, causal models of physical objects, intentional agents, and their interactions — to explore and model an environment, and plan effectively to achieve task goals.



Dream to Control: Learning Behaviors by Latent Imagination

Dreamer is presented, a reinforcement learning agent that solves long-horizon tasks purely by latent imagination and efficiently learn behaviors by backpropagating analytic gradients of learned state values through trajectories imagined in the compact state space of a learned world model.

Agent57: Outperforming the Atari Human Benchmark

This work proposes Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games and trains a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative.

Model-Based Reinforcement Learning for Atari

Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

An algorithm is developed that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories.

Learning and Querying Fast Generative Models for Reinforcement Learning

It is demonstrated that agents which query these models for decision making outperform strong model-free baselines on the game MSPACMAN, demonstrating the potential of using learned environment models for planning.

Planning to Explore via Self-Supervised World Models

Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.

World Models

This work explores building generative neural network models of popular reinforcement learning environments by using features extracted from the world model as inputs to an agent, and can train a very compact and simple policy that can solve the required task.

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Temporal Difference Variational Auto-Encoder

TD-VAE is proposed, a generative sequence model that learns representations containing explicit beliefs about states several steps into the future, and that can be rolled out directly without single-step transitions.

Deep Variational Reinforcement Learning for POMDPs

Deep variational reinforcement learning (DVRL) is proposed, which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information.