Emergent Prosociality in Multi-Agent Games Through Gifting

  title={Emergent Prosociality in Multi-Agent Games Through Gifting},
  author={Woodrow Z. Wang and Mark Beliaev and Erdem Biyik and Daniel A. Lazar and Ramtin Pedarsani and Dorsa Sadigh},
Coordination is often critical to forming prosocial behaviors – behaviors that increase the overall sum of rewards received by all agents in a multi-agent game. However, state of the art reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist. Previous works address this challenge with explicit reward shaping, which requires the strong assumption that agents can be forced to be prosocial. We propose using a less… 

Figures and Tables from this paper

Social Coordination and Altruism in Autonomous Driving
A quantitative representation of the AVs’ social preferences and design a distributed reward structure that induces altruism into their decision-making process are introduced and Altruistic AVs are able to form alliances, guide the traffic, and affect the behavior of the HVs to handle competitive driving scenarios.
A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings
It is shown that social norms emerge in multi-agent systems containing this agent and investigated the conditions under which this helps them achieve socially beneficial outcomes.
Influencing Towards Stable Multi-Agent Interactions
The effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation is demonstrated.
Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games
A generalizable and sample algorithm for multi-agent coordination in decentralized general-sum games without any access to other agents’ rewards or observations and an auxiliary opponent modeling task so that agents can infer their opponents’ type and dynamically alter corresponding strategies during execution.
Cooperation and Learning Dynamics under Risk Diversity and Financial Incentives
In this paper, we investigate the role of risk diversity in groups of agents learning to play collective risk dilemmas (CRDs). We show that risk diversity poses new challenges to cooperation that are
Robustness and Adaptability of Reinforcement Learning-Based Cooperative Autonomous Driving in Mixed-Autonomy Traffic
The mixed-autonomy problem is formulated as a multi-agent reinforcement learning (MARL) problem and a decentralized framework and reward function for training cooperative AVs is proposed and enables AVs to learn the decision-making of HVs implicitly from experience and optimizes for a social utility while prioritizing safety and allowing adaptability.
Distributed Reinforcement Learning for Robot Teams: A Review
This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches, and presents benchmarks and robotic applications along with a discussion on current open avenues for research.


Prosocial learning agents solve generalized Stag Hunts better than selfish ones
It is shown that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes, and experimentally shows that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance.
Inequity aversion improves cooperation in intertemporal social dilemmas
It is found that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas and helps explain how large-scale cooperation may emerge and persist.
Stable Opponent Shaping in Differentiable Games
Stable Opponent Shaping (SOS) is presented, a new method that interpolates between LOLA and a stable variant named LookAhead that converges locally to equilibria and avoids strict saddles in all differentiable games.
Gifting in Multi-Agent Reinforcement Learning
This work introduces peer rewarding, in which agents can deliberately influence each others’ reward function, and empirically study gifting, a peer rewarding mechanism which allows agents to reward other agents as part of their action space.
Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective
The paper demonstrates that lenience provides learners with more accurate information about the benefits of performing their actions, resulting in higher likelihood of convergence to the globally optimal solution, and supports the strength and generality of evolutionary game theory as a backbone for multiagent learning.
Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems
This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria, and can serve as a basis for choosing the appropriate algorithm for a new domain.
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
It is argued that exploration in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to the regions of the state space they explore if the agents can coordinate their exploration and maximize extrinsic returns.
Theory of Minds: Understanding Behavior in Groups Through Inverse Planning
This work develops a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH), grounded in the formalism of stochastic games and multi- agent reinforcement learning.
Locally noisy autonomous agents improve global human coordination in network experiments
It is shown that bots acting with small levels of random noise and placed in central locations meaningfully improve the collective performance of human groups, accelerating the median solution time by 55.6%.