• Corpus ID: 240419913

Procedural Generalization by Planning with Self-Supervised World Models

@article{Anand2021ProceduralGB,
  title={Procedural Generalization by Planning with Self-Supervised World Models},
  author={Ankesh Anand and Jacob Walker and Yazhe Li and Eszter V'ertes and Julian Schrittwieser and Sherjil Ozair and Th{\'e}ophane Weber and Jessica B. Hamrick},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.01587}
}
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis… 
A Survey of Generalisation in Deep Reinforcement Learning
TLDR
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.
On the link between conscious function and general intelligence in humans and machines
TLDR
This work examines the cognitive abilities associated with three contemporary theories of conscious function: Global Workspace Theory (GWT), Information Generation Theory (IGT), and Attention Schema Theory (AST) to propose ways in which insights from each of the three theories may be combined into a unified model.
Learning Robust Real-Time Cultural Transmission without Human Data
Figure 1 | Freeze-frames from a single episode of test-time evaluation, in chronological order from left to right. (a) Our cultural transmission agent (blue avatar) is spawned in a held-out task; (b)

References

SHOWING 1-10 OF 80 REFERENCES
Planning to Explore via Self-Supervised World Models
TLDR
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.
An investigation of model-free planning
TLDR
It is demonstrated empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner.
Relational Deep Reinforcement Learning
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception
The Value Equivalence Principle for Model-Based Reinforcement Learning
TLDR
It is argued that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning, and the principle of value equivalence underlies a number of recent empirical successes in RL.
Shaping Belief States with Generative Environment Models for RL
TLDR
It is found that predicting multiple steps into the future (overshooting) is critical for stable representations to emerge and a scheme to reduce this computational burden is proposed, allowing us to build agents that are competitive with model-free baselines.
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
TLDR
It is shown that for some games procedural level generation enables generalization to new levels within the same distribution and it is possible to achieve better performance with less data by manipulating the difficulty of the levels in response to the performance of the agent.
Measuring and Characterizing Generalization in Deep Reinforcement Learning
TLDR
The extent to which deep Q-networks learn generalized representations is called into question, and it is suggested that more experimentation and analysis is necessary before claims of representation learning can be supported.
Composable Planning with Attributes
TLDR
This work considers a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest, and proposes a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions.
Structured agents for physical construction
TLDR
A suite of challenging physical construction tasks inspired by how children play with blocks are introduced, such as matching a target configuration, stacking blocks to connect objects together, and creating shelter-like structures over target objects.
Model-Based Reinforcement Learning for Atari
TLDR
Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.
...
1
2
3
4
5
...