Corpus ID: 232092445

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

@article{Yu2021TheSE,
  title={The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games},
  author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre M. Bayen and Yi Wu},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01955}
}
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II… Expand
Coordinated Proximal Policy Optimization
TLDR
The monotonicity of policy improvement when optimizing a theoretically-grounded joint objective is proved, and a simplified optimization objective based on a set of approximations is derived that can achieve dynamic credit assignment among agents and alleviating the high variance issue during the concurrent update of agent policies. Expand
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
TLDR
This work consistently evaluate and compare three different classes of MARL algorithms in a diverse range of cooperative multi-agent learning tasks, and provides insights regarding the effectiveness of different learning approaches. Expand
Learning Efficient Multi-Agent Cooperative Visual Exploration
  • Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work extends the state-of-the-art singleagent RL solution, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based global-goal planner, Spatial Coordination Planner (SCP), which leverages spatial information from each individual agent in an end-toend manner and effectively guides the agents to navigate towards different spatial goals with high exploration efficiency. Expand
Noisy-MAPPO: Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
  • Siyue Hu, Jian Hu
  • Computer Science
  • ArXiv
  • 2021
TLDR
Noisy-MAPPO is proposed, which achieves more than 90% winning rates in all StarCraft Multi-agent Challenge (SMAC) scenarios and a random noise method improves the performance of vanilla MAPPO by 80% in some Super-Hard scenarios in SMAC. Expand
Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
  • Jian Hu
  • 2021
TLDR
A novel policy perturbation method, which disturb the advantage values via random Gaussian noise, which outperform the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC without agent-specific features. Expand
Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
TLDR
A novel policy regularization method, which disturbs the advantage values via random Gaussian noise, which outperforms the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC without agent-specific features. Expand
Settling the Variance of Multi-Agent Policy Gradients
TLDR
A rigorous analysis of policy gradient methods is offered by quantifying the contributions of the number of agents and agents’ explorations to the variance of MAPG estimators and derives the optimal baseline (OB) that achieves the minimal variance. Expand
TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations
  • Shiyu Huang, Wenze Chen, +5 authors Jun Zhu
  • Computer Science
  • ArXiv
  • 2021
TLDR
Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios and the method achieves state-of-theart performances on various academic scenarios. Expand
ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind
  • Yuanfei Wang, Fangwei Zhong, Jing Xu, Yizhou Wang
  • Computer Science
  • ArXiv
  • 2021
TLDR
The proposed Theory of Mind model not only outperforms the state-of-the-art methods on reward and communication efficiency, but also shows good generalization across different scales of the environment. Expand
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
TLDR
Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, therefore establishing a new state of the art in multi-agent reinforcement learning. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 45 REFERENCES
Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
TLDR
A new deep multi-agent RL method, the Simplified Action Decoder (SAD), which resolves this contradiction exploiting the centralized training phase and establishes a new SOTA for learning methods for 2-5 players on the self-play part of the Hanabi challenge. Expand
Emergence of Grounded Compositional Language in Multi-Agent Populations
TLDR
This paper proposes a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language that is represented as streams of abstract discrete symbols uttered by agents over time, but nonetheless has a coherent structure that possesses a defined vocabulary and syntax. Expand
The Surprising Effectiveness of MAPPO in Cooperative
  • In International Conference on Learning Representations,
  • 2021
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning. Expand
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
High-Dimensional Continuous Control Using Generalized Advantage Estimation
TLDR
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias. Expand
A Closer Look at Deep Policy Gradients
TLDR
A fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes shows that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Expand
Implementation Matters in Deep RL: A Case Study on PPO and TRPO
TLDR
The results show that algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm are responsible for most of PPO's gain in cumulative reward over TRPO, and fundamentally change how RL methods function. Expand
Value-Decomposition Networks For Cooperative Multi-Agent Learning
TLDR
This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. Expand
Valuedecomposition networks for cooperative multi-agent learning based on team reward
  • Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems
  • 2018
...
1
2
3
4
5
...