Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning

  title={Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning},
  author={Yuchen Xiao and Xueguang Lyu and Chris Amato},
  journal={2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS)},
  • Yuchen Xiao, Xueguang Lyu, Chris Amato
  • Published 16 October 2021
  • Computer Science
  • 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS)
Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy… 


Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient
This paper proposes a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; and a Multi-Agent Adversarial Learning (MAAL) to efficiently solve the proposed formulation.
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
This work presents an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep, which enables more effective and scalable learning in complex multi- agent environments, when compared to recent approaches.
DOP: Off-Policy Multi-Agent Decomposed Policy Gradients
This paper investigates causes that hinder the performance of MAPG algorithms and presents a multi-agent decomposed policy gradient method (DOP), which introduces the idea of value function decomposition into the multi- agent actor-critic framework and formally shows that DOP critics have sufficient representational capability to guarantee convergence.
MAVEN: Multi-Agent Variational Exploration
A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks.
Counterfactual Multi-Agent Policy Gradients
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
This paper proposes to merge the two directions of MARL and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step, and compares LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II.
The StarCraft Multi-Agent Challenge
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Coordination by Multi-Critic Policy Gradient Optimization
The proposed Multi-Critic Policy Optimization architecture with multiple value estimating networks and a novel advantage function that optimizes a stochastic actor policy network to achieve optimal coordination of agents achieves optimal coordination and compliance with constraints such as collision avoidance.