• Corpus ID: 240419913

Procedural Generalization by Planning with Self-Supervised World Models

  title={Procedural Generalization by Planning with Self-Supervised World Models},
  author={Ankesh Anand and Jacob Walker and Yazhe Li and Eszter V'ertes and Julian Schrittwieser and Sherjil Ozair and Th{\'e}ophane Weber and Jessica B. Hamrick},
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis… 
A Survey of Generalisation in Deep Reinforcement Learning
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.


Planning to Explore via Self-Supervised World Models
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.
An investigation of model-free planning
It is demonstrated empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner.
Relational Deep Reinforcement Learning
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception
The Value Equivalence Principle for Model-Based Reinforcement Learning
It is argued that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning, and the principle of value equivalence underlies a number of recent empirical successes in RL.
Shaping Belief States with Generative Environment Models for RL
It is found that predicting multiple steps into the future (overshooting) is critical for stable representations to emerge and a scheme to reduce this computational burden is proposed, allowing us to build agents that are competitive with model-free baselines.
Data-Efficient Reinforcement Learning with Self-Predictive Representations
The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
It is shown that for some games procedural level generation enables generalization to new levels within the same distribution and it is possible to achieve better performance with less data by manipulating the difficulty of the levels in response to the performance of the agent.
Measuring and Characterizing Generalization in Deep Reinforcement Learning
The extent to which deep Q-networks learn generalized representations is called into question, and it is suggested that more experimentation and analysis is necessary before claims of representation learning can be supported.
Composable Planning with Attributes
This work considers a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest, and proposes a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions.
Structured agents for physical construction
A suite of challenging physical construction tasks inspired by how children play with blocks are introduced, such as matching a target configuration, stacking blocks to connect objects together, and creating shelter-like structures over target objects.