Corpus ID: 233423273

# Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

@article{Vasilev2021SemiOnPolicyTF,
title={Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients},
author={Bozhidar Vasilev and Tarun Gupta and Bei Peng and S. Whiteson},
journal={ArXiv},
year={2021},
volume={abs/2104.13446}
}
Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-onpolicy (SOP) training as an effective and computationally efficient way to address the sample… Expand
1 Citations

#### Figures from this paper

Centralized Model and Exploration Policy for Multi-Agent RL
This work empirically evaluates the proposed model-based algorithm, MARCO†, in three cooperative communication tasks, where it improves sample efficiency by up to 20x and learns a centralized exploration policy within the model that learns to collect additional data in state-action regions with high model uncertainty. Expand

#### References

SHOWING 1-10 OF 38 REFERENCES
MAVEN: Multi-Agent Variational Exploration
• Computer Science, Mathematics
• NeurIPS
• 2019
A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Expand
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
• Computer Science, Mathematics
• NIPS
• 2017
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Proximal Policy Optimization Algorithms
• Computer Science
• ArXiv
• 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
• Computer Science
• ArXiv
• 2020
This work evaluates and compares three different classes of MARL algorithms in a diverse range of multi-agent learning tasks and shows that algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks. Expand
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
• Computer Science, Mathematics
• J. Mach. Learn. Res.
• 2020
QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion, is evaluated on a challenging set of SMAC scenarios and it significantly outperforms existing multi-agent reinforcement learning methods. Expand
Safe and Efficient Off-Policy Reinforcement Learning
• Computer Science, Mathematics
• NIPS
• 2016
A novel algorithm, Retrace ($\lambda$), is derived, believed to be the first return-based off-policy control algorithm converging a.s. to $Q^*$ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). Expand
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
• Computer Science, Mathematics
• ICML
• 2019
This work presents an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep, which enables more effective and scalable learning in complex multi- agent environments, when compared to recent approaches. Expand