From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning

  title={From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement Learning},
  author={Zhiuxan Liang and Jiannong Cao and Shan Jiang and Divya Saxena and Jinlin Chen and Huafeng Xu},
—Multi-agent reinforcement learning (MARL) has been gaining extensive attention from academia and industries in the past few decades. One of the fundamental problems in MARL is how to evaluate different approaches comprehensively. Most existing MARL methods are evaluated in either video games or simplistic simulated scenarios. It remains unknown how these methods perform in real-world scenarios, especially multi-robot systems. This paper introduces a scalable emulation platform for multi-robot… 

Learning multi-robot coordination from demonstrations

A Distributed Differentiable Dynamic Game framework to learn the objective function and the dynamics constraint of each robot from multi-robot coordinating demonstrations, which features a distributed learning process, which maintains a communication topology in both forward and backward-passes.



Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

Deep Multi Agent Reinforcement Learning for Autonomous Driving

This work presents two novel, scalable and centralized MARL training techniques (MA-MeSN, MA-BoN), which achieve faster convergence and higher cumulative reward in complex domains like autonomous driving simulators, and compares their performance to existing state-of-the-art algorithms, DIAL and IMS.

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

A decentralized sensor-level collision-avoidance policy for multi-robot systems, which enables a robot to make effective progress in a crowd without getting stuck and has been successfully deployed on different types of physical robot platforms without tedious parameter tuning.

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

This paper mainly studies the collaborative formation and navigation of multi-robots by using the deep reinforcement learning algorithm, and improves the classical Deep Deterministic Policy Gradient (DDPG) to address the single robot mapless navigation task.

Learning to Fly—a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control

This paper proposes an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine that combines multi-agent and vision-based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects.

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.

QPLEX: Duplex Dueling Multi-Agent Q-Learning

A novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function and encodes the IGM principle into the neural network architecture and thus enables efficient value function learning.

Emergent Tool Use From Multi-Agent Autocurricula

This work finds clear evidence of six emergent phases in agent strategy in the authors' environment, each of which creates a new pressure for the opposing team to adapt, and compares hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.