• Corpus ID: 244346227

A Survey of Generalisation in Deep Reinforcement Learning

@article{Kirk2021ASO,
  title={A Survey of Generalisation in Deep Reinforcement Learning},
  author={Robert Kirk and Amy Zhang and Edward Grefenstette and Tim Rocktaschel},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.09794}
}
The study of generalisation in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We provide a unifying formalism and terminology for… 

Figures and Tables from this paper

Contextualize Me -- The Case for Context in Reinforcement Learning
TLDR
This work shows that theoretically optimal behavior in contextual Markov Decision Processes requires explicit context information, and introduces the first benchmark library designed for generalization based on cRL extensions of popular benchmarks, CARL.
L2Explorer: A Lifelong Reinforcement Learning Assessment Environment
TLDR
This work introduces a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer), a new, Unity-based, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex and evolving evaluation curricula.
Multi-objective evolution for Generalizable Policy Gradient Algorithms
TLDR
MetaPG is presented, an evolutionary method that discovers new RL algorithms represented as graphs following a multi-objective search criteria in which different RL objectives are encoded in separate RL objectives in order to improve performance and generalizability and reduce instability.
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations
TLDR
This paper proposes Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation in an offline-training-onlineadaptation setting, and shows that PAnDR outperforms existing algorithms in several representative policy adaptation problems.
Zipfian environments for Reinforcement Learning
TLDR
Three complementary RL environments where the agent’s experience varies according to a Zipfian (discrete power law) distribution are developed, showing that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods
TLDR
A residual policy gradient method is developed that is able to integrate knowledge across different abstraction levels in the class hierarchy and results in improved sample efficiency and generalisation to unseen objects in commonsense games.
Recurrent Model-Free RL can be a Strong Baseline for Many POMDPs
TLDR
It is found that careful architecture and hyperparameter decisions can often yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques.
Deep Reinforcement Learning: Opportunities and Challenges
TLDR
In this article, a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI is given, and a discussion is attempted, attempting to answer: “Why has RL not been widely adopted in practice yet?” and “When is RL helpful?’.
Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning
TLDR
A hierarchical RL agent is proposed that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task.
Learning Transferable Concepts in Deep Reinforcement Learning
TLDR
It is shown that learning discrete representations of sensory inputs can provide a high-level abstraction that is common across multiple tasks, thus facilitating the transference of information.
...
1
2
3
...

References

SHOWING 1-10 OF 219 REFERENCES
Investigating Generalisation in Continuous Deep Reinforcement Learning
TLDR
It is shown that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice, and a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods are provided.
CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning
TLDR
CARL is proposed, a collection of well-known RL environments extended to contextual RL problems to study generalization and allows first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
Assessing Generalization in Deep Reinforcement Learning
TLDR
The key finding is that `vanilla' deep RL algorithms generalize better than specialized schemes that were proposed specifically to tackle generalization.
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning
TLDR
This work proposes an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents, and shows that agents can generalize to test parameters more than 10 standard deviations away from the training parameter distribution.
Generalization and Regularization in DQN
TLDR
Despite regularization being largely underutilized in deep RL, it is shown that it can, in fact, help DQN learn more general features and can then be reused and fine-tuned on similar tasks, considerably improving the sample efficiency of D QN.
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
TLDR
MiniHack is a powerful sandbox framework for easily designing novel RL environments with environments ranging from small rooms to complex, procedurally generated worlds, and can wrap existing RL benchmarks and provide ways to seamlessly add additional complexity.
Deep Reinforcement Learning that Matters
TLDR
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning
TLDR
The generalization experiments conducted on both procedurally generated scenarios and real-world scenarios show that increasing the diversity and the size of the training set leads to the improvement of the generalizability of the RL agents.
Replay-Guided Adversarial Environment Design
TLDR
It is argued that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training, and theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), it can improve the convergence to Nash equilibria.
Natural Environment Benchmarks for Reinforcement Learning
TLDR
This work proposes three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition.
...
1
2
3
4
5
...