Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of…
This paper introduces a Multiagent Bidirectionally-Coordinated Network (BiCNet) with a vectorised extension of actor-critic formulation and demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players.
This analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of coordination strategies that is similar to these of experienced game players, and is easily adaptable to the tasks with heterogeneous agents.
Extensive experiments demonstrate that the theoretically derive a general formula of Q_{tot} in terms of $Q^{i}$, based on which a multi-head attention formation to approximate $Q_{Tot}$ can naturally implement, resulting in not only a refined representation of $Tot$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies.
Under the PR2 framework, decentralized-training-decentralized-execution algorithms are developed that are proved to converge in the self-play scenario when there is one Nash equilibrium and experiments show that it is critical to reason about how the opponents believe about what the agent believes.
This paper addresses the order dispatching problem using multi-agent reinforcement learning (MARL), which follows the distributed nature of the peer-to-peer ridesharing problem and possesses the ability to capture the stochastic demand-supply dynamics in large-scale ridesh sharing scenarios.
This paper focuses on a microgrid in which a large-scale modern homes interact together to optimize their electricity cost, and presents an Entropy-Based Collective Multiagent Deep Reinforcement Learning (EB-C-MADRL) framework to address it.
The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.
This paper introduces a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings and proves theoretically the learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time.
Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art in multi-agent MARL.