Introducing Symmetries to Black Box Meta Reinforcement Learning

  title={Introducing Symmetries to Black Box Meta Reinforcement Learning},
  author={Louis Kirsch and Sebastian Flennerhag and Hado Philip van Hasselt and Abram L. Friesen and Junhyuk Oh and Yutian Chen},
  booktitle={AAAI Conference on Artificial Intelligence},
Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform compared to human-engineered RL algorithms in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent… 

Figures and Tables from this paper

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

This work investigates meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks and shows that when meta-learning approaches are evaluated on different tasks, multi-task pretraining with meta-tuning on new tasks performs equally as well, or better, than meta-pretraining withMeta test-time adaptation.

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.

Learning from Symmetry: Meta-Reinforcement Learning with Symmetric Data and Language Instructions

This work proposes a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetric data and language instructions and results show the method can greatly improve the generalization and ef-familiarity of meta- reinforcementLearning.

Meta-Gradients in Non-Stationary Environments

It is suggested that contextualising meta-gradients can play a pivotal role in extracting high performance from meta- gradients in non-stationary settings, and whether meta-gradient methods provide a bigger advantage in highly non- stationary environments.

Minimal neural network models for permutation invariant agents

This work constructs a conceptually simple model that exhibit flexibility most ANNs lack, and demonstrates the model's properties on multiple control problems, and shows that it can cope with even very rapid permutations of input indices, as well as changes in input size.

Goal-Conditioned Generators of Deep Policies

This work studies goal-conditioned neural nets that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s.

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

It is shown that Hessian estimation, implemented for example by DiCE and its variants, always adds bias and can also add variance to meta-gradient estimation, and disentangle the sources of bias and variance and present an empirical study that relates existing estimators to each other.

Symmetry-Based Representations for Artificial and Biological General Intelligence

It is argued that symmetry transformations are a fundamental principle that can guide the search for what makes a good representation, and may be an important general framework that determines the structure of the universe, constrains the nature of natural tasks and consequently shapes both biological and artificial intelligence.


It is found that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance, and without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments.

Discovering Evolution Strategies via Meta-Black-Box Optimization

This work proposes to discover effective update rules for evolution strategies via meta-learning, and employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions.



Meta-Gradient Reinforcement Learning with an Objective Discovered Online

This work proposes an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment, and adapts over time to learn with greater efficiency.

Learning to reinforcement learn

This work introduces a novel approach to deep meta-reinforcement learning, which is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure.

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn, and can generalize to new environments that are entirely different from those used for meta-training.

Meta Learning via Learned Loss

This paper presents a meta-learning method for learning parametric loss functions that can generalize across different tasks and model architectures, and develops a pipeline for “meta-training” such loss functions, targeted at maximizing the performance of the model trained under them.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Meta-Learning through Hebbian Plasticity in Random Networks

This work proposes a search method that, instead of optimizing the weight parameters of neural networks directly, only searches for synapse-specific Hebbian learning rules that allow the network to continuously self-organize its weights during the lifetime of the agent.

Meta-learning curiosity algorithms

This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains.

Discovering Reinforcement Learning Algorithms

This paper introduces a new meta-learning approach that discovers an entire update rule which includes both 'what to predict' and 'how to learn from it' by interacting with a set of environments, and discovers its own alternative to the concept of value functions.

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.

Evolved Policy Gradients

Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.