Corpus ID: 237490372

Learning Selective Communication for Multi-Agent Path Finding

  title={Learning Selective Communication for Multi-Agent Path Finding},
  author={Ziyuan Ma and Yudong Luo and Jia Pan},
  • Ziyuan Ma, Yudong Luo, Jia Pan
  • Published 12 September 2021
  • Computer Science
  • ArXiv
Learning communication via deep reinforcement learning (RL) or imitation learning (IL) has recently been shown to be an effective way to solve Multi-Agent Path Finding (MAPF). However, existing communication based MAPF solvers focus on broadcast communication, where an agent broadcasts its message to all other or predefined agents. It is not only impractical but also leads to redundant information that could even impair the multi-agent cooperation. A succinct communication scheme should learn… 

Figures and Tables from this paper


Distributed Heuristic Multi-Agent Path Finding with Communication
This paper combines communication with deep Q-learning to provide a novel learning based method for MAPF, where agents achieve cooperation via graph convolution, to guide RL algorithm on long-horizon goal-oriented tasks.
Learning Multi-Agent Communication through Structured Attentive Reasoning
This work introduces a novel communication architecture that exploits a memory-based attention network that selectively reasons about the value of information received from other agents while considering its past experiences and develops an explicit architecture that is targeted towards communication.
Learning to Schedule Communication in Multi-agent Reinforcement Learning
A multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages, which is capable of deciding which agents should be entitled to broadcasting their messages.
TarMAC: Targeted Multi-Agent Communication
This work proposes a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments, and augment this with a multi-round communication approach.
Multi-Agent Graph-Attention Communication and Teaming
A novel multi-agent reinforcement learning algorithm with a graph-attention communication protocol in which a Scheduler to help with the problems of when to communicate and whom to address messages to, and a Message Processor using Graph Attention Networks with dynamic graphs to deal with communication signals is proposed.
Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control
This work proposes Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL by limiting the variance of the exchanged messages between agents during the training phase.
Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
This paper presents Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings.
PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning
PRIMAL is presented, a novel framework for MAPF that combines reinforcement and imitation learning to teach fully decentralized policies, where agents reactively plan paths online in a partially observable world while exhibiting implicit coordination.
MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments
This paper proposes a decentralized partially observable multi-agent path planning with evolutionary reinforcement learning (MAPPER) method to learn an effective local planning policy in mixed dynamic environments and shows that MAPPER is able to achieve higher success rates and more stable performance when exposed to a large number of non-cooperative dynamic obstacles.
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.