SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving
The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.
Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching
- Ming Zhou, Jiarui Jin, Jieping Ye
- Computer ScienceInternational Conference on Information and…
- 7 October 2019
Experiments show that the proposed decentralized execution order-dispatching method outperforms the baselines in terms of accumulated driver income (ADI) and Order Response Rate (ORR) in various traffic environments.
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts
- Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou
- Computer ScienceInternational Joint Conference on Artificial…
- 7 May 2021
A novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO), is proposed, which can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.
Generative adversarial exploration for reinforcement learning
- Weijun Hong, Menghui Zhu, Peng Sun
- Computer ScienceInternational Conference on Distributed…
- 13 October 2019
This work proposes a novel method called GAEX to encourage exploration in RL via introducing an intrinsic reward output from a generative adversarial network, where the generator provides fake samples of states that help discriminator identify those less frequently visited states.
- Jiarui Jin, Ming Zhou, Jieping Ye
- Proceedings of the 28th ACM International…
- 3 November 2019
Multi-Agent Interactions Modeling with Correlated Policies
- Minghuan Liu, Ming Zhou, Yong Yu
- Computer ScienceInternational Conference on Learning…
- 4 January 2020
A Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution and outperforms state-of-the-art multi-agent imitation learning methods.
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning
MALib is a scalable and efficient computing framework for population-based multi-agent reinforcement learning that enables efficient code reuse and flexible deployments on different distributed computing paradigms and achieves throughput higher than 40K FPS on a single machine with 32 CPU cores.
Regioned Episodic Reinforcement Learning
- Jiarui Jin, Cong Chen, Alex Smola
- Computer Science
- 4 May 2021
Regioned Episodic Reinforcement Learning (RERL) is introduced that combines the episodic and goal-oriented learning strengths and leads to a more sample efficient andective algorithm.
Efficient Policy Space Response Oracles
- Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu
- Computer ScienceArXiv
- 28 January 2022
Theoretically, the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess, and it is proved that the no-regret optimization has a regret bound of O (cid:112) T log [( k 2 + k ) / 2]) , where k is the size of restricted policy set.