• Corpus ID: 208548554

BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

  title={BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)},
  author={Marek Rosa and Olga Afanasjeva and Simon Andersson and Joseph Davidson and Nicholas Guttenberg and Petr Hlubucek and Martin Poliak and Jaroslav V{\'i}tků and Jan Feyereisl},
In this work, we propose a novel memory-based multi-agent meta-learning architecture and learning procedure that allows for learning of a shared communication policy that enables the emergence of rapid adaptation to new and unseen environments by learning to learn learning algorithms through communication. Behavior, adaptation and learning to adapt emerges from the interactions of homogeneous experts inside a single agent. The proposed architecture should allow for generalization beyond the… 

Figures from this paper

Discovering Reinforcement Learning Algorithms
This paper introduces a new meta-learning approach that discovers an entire update rule which includes both 'what to predict' and 'how to learn from it' by interacting with a set of environments, and discovers its own alternative to the concept of value functions.
This paper derives and presents deterministic memory mean-field temporal-difference reinforcement learning dynamics where the agents only partially observe the actual state of the environment, and showcases the broad applicability of the dynamics across different classes of agent-environment systems.
The Role of Bio-Inspired Modularity in General Learning
It is argued that the topology of biological brains likely evolved certain features that are designed to achieve this kind of informational conservation, and the highly conserved property of modularity may offer a solution to weight-update learning methods that adheres to the learning without catastrophic forgetting and bootstrapping constraints.
Bootstrapping of memetic from genetic evolution via inter-agent selection pressures
The main idea is that while finding a message that replicates regardless of the underlying agent’s network weights may be impossible in general, if the memetic degrees of free are high, it may be possible to discover learning algorithms for multi-agent cognitive systems based on social and cultural adaptation.
Meta Learning Backpropagation And Improving It
Variable Shared Meta Learning (VSML) demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashion and introspection reveals that the authors' meta learned LAs learn through fast association in a way that is qualitatively different from gradient descent.
A Simple Guard for Learned Optimizers
LGL2O is proposed, a new class of Safeguarded L2O which can take a learned optimizer and safeguard it with a generic learning algorithm so that by conditionally switching between the two, the resulting algorithm is provably convergent and in practice converges much better than GL2O.
This work outlines a form of neural network collectives (NNC), motivated by recent work in the field of collective intelligence, and gives details about the specific sub-components that an NNC may have.


Learning to Communicate with Deep Multi-Agent Reinforcement Learning
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
Learning to Communicate in Multi-Agent Reinforcement Learning : A Review
This work considers the issue of multiple agents learning to communicate through reinforcement learning within partially observable environments, with a focus on information asymmetry, and introduces the idea of an experimental setup to expose this cost in cooperative-competitive game.
Learning with Opponent-Learning Awareness
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
This work uses meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context and demonstrates the importance of incorporating online adaptation into autonomous agents that operate in the real world.
Concurrent Meta Reinforcement Learning
This work proposes an alternative parallel framework, which it names "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one and demonstrates the effectiveness of the proposed CMRL at improving over sequential methods in a variety of challenging tasks.
TarMAC: Targeted Multi-Agent Communication
This work proposes a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments, and augment this with a multi-round communication approach.
Meta reinforcement learning as task inference
This work proposes a method that separately learns the policy and the task belief by taking advantage of various kinds of privileged information, which can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment with sparse rewards and requiring long-term memory.
Improving Generalization in Meta Reinforcement Learning using Learned Objectives
MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn, and can generalize to new environments that are entirely different from those used for meta-training.
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
A simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios is developed and demonstrated that meta- learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime.
Learning to Learn: Meta-Critic Networks for Sample Efficient Learning
A meta-critic approach to meta-learning is proposed: an action-value function neural network that learns to criticise any actor trying to solve any specified task in a trainable task-parametrised loss generator.