Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution

  title={Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution},
  author={Ke Xue and Yutong Wang and Lei Yuan and Cong Guan and Chao Qian and Yang Yu},
Generating agents that can achieve Zero-Shot Coordination (ZSC) with unseen partners is a new challenge in cooperative Multi-Agent Reinforcement Learning (MARL). Recently, some studies have made progress in ZSC by exposing the agents to diverse partners during the training process. They usually involve self-play when training the partners, implicitly assuming that the tasks are homogeneous. However, many real-world tasks are heterogeneous, and hence previous methods may fail. In this paper, we… 

Figures and Tables from this paper



"Other-Play" for Zero-Shot Coordination

This work introduces a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem.

Trajectory Diversity for Zero-Shot Coordination

This work introduces Trajectory Diversity (TrajeDi) – a differentiable objective for generating diverse reinforcement learning policies and derive TrajeDi as a generalization of the Jensen-Shannon divergence between policies and motivate it experimentally in two simple settings.

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

This work formalizes an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents.

Heterogeneous Multi-Agent Reinforcement Learning for Unknown Environment Mapping

This work presents an actor-critic algorithm that allows a team of heterogeneous agents to learn decentralized control policies for covering an unknown environment, and develops a simulation environment that includes real-world environmental factors such as turbulence, delayed communication, and agent loss to train teams of agents.

Planning, Learning and Coordination in Multiagent Decision Processes

The extent to which methods from single-agent planning and learning can be applied in multiagent settings is investigated and the decomposition of sequential decision processes so that coordination can be learned locally, at the level of individual states.

Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

In MEP, agents in the population are trained with the derived Population Entropy bonus to promote both pairwise diversity between agents and individual diversity of agents themselves, and a common best agent is trained by paring with agents in this diversified population via prioritized sampling.

A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications

A detailed and systematic overview of multi-agent deep reinforcement learning methods in views of challenges and applications and a taxonomy of challenges is proposed and the corresponding structures and representative methods are introduced.

Collaborating with Humans without Human Data

This work studies the problem of how to train agents that collaborate well with human partners without using human data, and argues that the crux of the problem is to produce a diverse set of training partners.

StarCraft II: A New Challenge for Reinforcement Learning

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.

On the Utility of Learning about Humans for Human-AI Coordination

A simple environment that requires challenging coordination, based on the popular game Overcooked, is introduced and a simple model is learned that mimics human play, and it is found that the gains come from having the agent adapt to the human's gameplay.