Corpus ID: 233423273

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

@article{Vasilev2021SemiOnPolicyTF,
  title={Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients},
  author={Bozhidar Vasilev and Tarun Gupta and Bei Peng and S. Whiteson},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.13446}
}
Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-onpolicy (SOP) training as an effective and computationally efficient way to address the sample… Expand
1 Citations

Figures from this paper

References

SHOWING 1-10 OF 38 REFERENCES
MAVEN: Multi-Agent Variational Exploration
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Proximal Policy Optimization Algorithms
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Counterfactual Multi-Agent Policy Gradients
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
The StarCraft Multi-Agent Challenge
...
1
2
3
4
...