• Publications
  • Influence
RODE: Learning Roles to Decompose Multi-Agent Tasks
TLDR
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Expand
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
TLDR
We demonstrate that, despite its various theoretical shortcomings, Independent PPO, a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. Expand
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning
TLDR
We present a novel MARL approach called Universal Value Exploration (UneVEn), which uses universal successor features to learn policies of tasks related to the target task, but with simpler reward functions in a sample efficient manner. Expand
Planning and Learning for Decentralized MDPs With Event Driven Rewards
TLDR
We solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale. Expand
Successor Features Based Multi-Agent RL for Event-Based Decentralized MDPs
TLDR
We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards. Expand
Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty
TLDR
We address the problem of multiple agents finding their paths from respective sources to destination nodes in a graph (also called MAPF). Expand
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients
TLDR
In this paper, we introduce semi-onpolicy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. Expand