Corpus ID: 232092445

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

@article{Yu2021TheSE,
  title={The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games},
  author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre M. Bayen and Yi Wu},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01955}
}
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II… Expand
Learning Efficient Multi-Agent Cooperative Visual Exploration
  • Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work extends the state-of-the-art singleagent RL solution, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based global-goal planner, Spatial Coordination Planner (SCP), which leverages spatial information from each individual agent in an end-toend manner and effectively guides the agents to navigate towards different spatial goals with high exploration efficiency. Expand
Noisy-MAPPO: Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
  • Siyue Hu, Jian Hu
  • Computer Science
  • ArXiv
  • 2021
TLDR
Noisy-MAPPO is proposed, which achieves more than 90% winning rates in all StarCraft Multi-agent Challenge (SMAC) scenarios and a random noise method improves the performance of vanilla MAPPO by 80% in some Super-Hard scenarios in SMAC. Expand
Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
TLDR
A novel policy perturbation method, which disturb the advantage values via random Gaussian noise, which outperform the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC without agent-specific features. Expand
Policy Regularization with Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
  • Siyue Hu
  • 2021
TLDR
A novel policy regularization method is proposed, i.e, Noisy-MAPPO and AdvantageNoisy- MAPPO, which smooth out the advantage values by noise, and is much better than the vanilla MAPPO. Expand
Settling the Variance of Multi-Agent Policy Gradients
TLDR
A rigorous analysis of policy gradient methods is offered by quantifying the contributions of the number of agents and agents’ explorations to the variance of MAPG estimators and derives the optimal baseline (OB) that achieves the minimal variance. Expand
TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations
  • Shiyu Huang, Wenze Chen, +5 authors Jun Zhu
  • Computer Science
  • ArXiv
  • 2021
TLDR
Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios and the method achieves state-of-theart performances on various academic scenarios. Expand
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
TLDR
Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, therefore establishing a new state of the art in multi-agent reinforcement learning. Expand
Multi-Agent Constrained Policy Optimisation
  • Shangding Gu, J. Kuba, +6 authors Yaodong Yang
  • Computer Science
  • ArXiv
  • 2021
TLDR
The safe MARL problem is formulated as a constrained Markov game and solved with policy optimisation methods that enjoy theoretical guarantees of both monotonic improvement in reward and satisfaction of safety constraints at every iteration. Expand
A Review of Deep Reinforcement Learning for Smart Building Energy Management
  • Liang Yu, Shuqi Qin, Meng Zhang, Chao Shen, Tao Jiang, Xiaohong Guan
  • Computer Science, Engineering
  • IEEE Internet of Things Journal
  • 2021
TLDR
A comprehensive review of DRL for SBEM from the perspective of system scale is provided and the existing unresolved issues are identified and possible future research directions are pointed out. Expand
A review of mobile robot motion planning methods: from classical motion planning workflows to reinforcement learning-based architectures
  • Zichen He, Jiawei Wang, Chunwei Song
  • Computer Science
  • ArXiv
  • 2021
TLDR
RL-based motion planning approaches are reviewed, including RL optimization motion planners, map-free end-to-end methods that integrate sensing and decision-making, and multi-robot cooperative planning methods, to analyze the urgent challenges faced by these methods in detail. Expand
...
1
2
...

References

SHOWING 1-10 OF 45 REFERENCES
Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
TLDR
A new deep multi-agent RL method, the Simplified Action Decoder (SAD), which resolves this contradiction exploiting the centralized training phase and establishes a new SOTA for learning methods for 2-5 players on the self-play part of the Hanabi challenge. Expand
Emergence of Grounded Compositional Language in Multi-Agent Populations
TLDR
This paper proposes a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language that is represented as streams of abstract discrete symbols uttered by agents over time, but nonetheless has a coherent structure that possesses a defined vocabulary and syntax. Expand
The Surprising Effectiveness of MAPPO in Cooperative
  • In International Conference on Learning Representations,
  • 2021
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning. Expand
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
High-Dimensional Continuous Control Using Generalized Advantage Estimation
TLDR
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias. Expand
A Closer Look at Deep Policy Gradients
TLDR
A fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes shows that the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Expand
Implementation Matters in Deep RL: A Case Study on PPO and TRPO
TLDR
The results show that algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm are responsible for most of PPO's gain in cumulative reward over TRPO, and fundamentally change how RL methods function. Expand
Value-Decomposition Networks For Cooperative Multi-Agent Learning
TLDR
This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. Expand
Valuedecomposition networks for cooperative multi-agent learning based on team reward
  • Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems
  • 2018
...
1
2
3
4
5
...