Corpus ID: 199452986

Promoting Coordination through Policy Regularization in Multi-Agent Reinforcement Learning

  title={Promoting Coordination through Policy Regularization in Multi-Agent Reinforcement Learning},
  author={Paul Barde and Julien Roy and F{\'e}lix G. Harvey and Derek Nowrouzezahrai and C. Pal},
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two… Expand
Stabilizing Multi-Agent Deep Reinforcement Learning by Implicitly Estimating Other Agents’ Behaviors
This work demonstrates that given the implicit estimate of others’ actions, each agent can learn its policy in a relatively stationary environment and significantly alleviates the non-stationarity and outperforms the state-of-the-art in terms of both convergence speed and policy performance. Expand
Celebrating Diversity in Shared Multi-Agent Reinforcement Learning
This paper proposes an information-theoretical regularization to maximize the mutual information between agents’ identities and their trajectories, encouraging extensive exploration and diverse individualized behaviors in shared multi-agent reinforcement learning. Expand
Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning
This work proposes an algorithm that boosts MARL training using the biased action information of other agents based on a friend-or-foe concept and outperforms existing algorithms in various mixed cooperative-competitive environments. Expand


Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. Expand
Cooperative Multi-agent Control Using Deep Reinforcement Learning
It is shown that policy gradient methods tend to outperform both temporal-difference and actor-critic methods and that curriculum learning is vital to scaling reinforcement learning algorithms in complex multi-agent domains. Expand
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
This work presents an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep, which enables more effective and scalable learning in complex multi- agent environments, when compared to recent approaches. Expand
NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning
This work introduces a novel on-policy temporally consistent exploration strategy - Neural Adaptive Dropout Policy Exploration (NADPEx) - for deep reinforcement learning agents, modeled as a global random variable for conditional distribution. Expand
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
The results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards. Expand
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
The Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment are met, is presented. Expand
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. Expand
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning. Expand
Opponent Modeling in Deep Reinforcement Learning
Inspired by the recent success of deep reinforcement learning, this work presents neural-based models that jointly learn a policy and the behavior of opponents, and uses a Mixture-of-Experts architecture to encode observation of the opponents into a deep Q-Network. Expand