Solving the Diffusion of Responsibility Problem in Multiagent Reinforcement Learning with a Policy Resonance Approach

  title={Solving the Diffusion of Responsibility Problem in Multiagent Reinforcement Learning with a Policy Resonance Approach},
  author={Qing Fu and Tenghai Qiu and Jianqiang Yi and Z. Pu and Xiaolin Ai and Wanmai Yuan},
We report a previously undiscovered problem in multiagent reinforcement learning (MARL), named Diffusion of Responsibility (DR). DR causes failures in negotiating a reli- able division of responsibilities to complete sophisticated cooperative tasks. It reflects a flaw in how existing algorithms deal with the multiagent exploration-exploitation dilemma in both value-based and policy-based MARL methods. This DR problem shares similarities with a same-name phenomenon in the social psychology domain… 

Figures and Tables from this paper

Learning Heterogeneous Agent Cooperation via Multiagent League Training

This work proposes a general-purpose reinforcement learning algorithm named as Heterogeneous League Training (HLT) to address heterogeneous multiagent problems and provides a practical way to assess the difficulty of learning each role in a heterogeneous team.



Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task

Critical experimental results on a number of learning policies reported in the open literatures, namely greedy, ξ-greedy, Boltzmann Distribution, Simulated Annealing, Probability Matching, and Optimistic Initial Values are presented.

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

A new factorization method for MARL, QTRAN, is proposed, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems

A concentration policy gradient architecture that can learn effective policies in LMAS from scratch is presented that has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

This work shows that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popularMulti-agent testbeds: the particle-world environments, the StarCraft multi- agent challenge, the Hanabi challenge, and Google Research Football, with minimal hyperparameter tuning and without any domain-specific algorithmic modiflcations or architectures.

Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies

The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment, and shows the superiority of lookahead-based exploration over other exploration methods.

Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

This paper proposes an information-theoretical regularization to maximize the mutual information between agents’ identities and their trajectories, encouraging extensive exploration and diverse individualized behaviors in shared multi-agent reinforcement learning.

Learning Transferable Cooperative Behavior in Multi-Agent Teams

This work proposes to create a shared agent-entity graph, where agents and environmental entities form vertices, and edges exist between the vertices which can communicate with each other, and shows that the learned policies quickly transfer to scenarios with different team sizes along with strong zero-shot generalization performance.