Corpus ID: 233423273

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

@article{Vasilev2021SemiOnPolicyTF,
  title={Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients},
  author={Bozhidar Vasilev and Tarun Gupta and Bei Peng and S. Whiteson},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.13446}
}
Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-onpolicy (SOP) training as an effective and computationally efficient way to address the sample… Expand
1 Citations

Figures from this paper

Centralized Model and Exploration Policy for Multi-Agent RL
TLDR
This work empirically evaluates the proposed model-based algorithm, MARCO†, in three cooperative communication tasks, where it improves sample efficiency by up to 20x and learns a centralized exploration policy within the model that learns to collect additional data in state-action regions with high model uncertainty. Expand

References

SHOWING 1-10 OF 38 REFERENCES
MAVEN: Multi-Agent Variational Exploration
TLDR
A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Expand
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
TLDR
This work evaluates and compares three different classes of MARL algorithms in a diverse range of multi-agent learning tasks and shows that algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks. Expand
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion, is evaluated on a challenging set of SMAC scenarios and it significantly outperforms existing multi-agent reinforcement learning methods. Expand
Safe and Efficient Off-Policy Reinforcement Learning
TLDR
A novel algorithm, Retrace ($\lambda$), is derived, believed to be the first return-based off-policy control algorithm converging a.s. to $Q^*$ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). Expand
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
TLDR
This work presents an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep, which enables more effective and scalable learning in complex multi- agent environments, when compared to recent approaches. Expand
Counterfactual Multi-Agent Policy Gradients
TLDR
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
The StarCraft Multi-Agent Challenge
TLDR
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened. Expand
...
1
2
3
4
...