Corpus ID: 235446535

A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings

  title={A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings},
  author={Eugene Vinitsky and R. Koster and John P. Agapiou and Edgar A. Du{\'e}{\~n}ez-Guzm{\'a}n and Alexander Sasha Vezhnevets and Joel Z. Leibo},
Autonomously operating learning agents are becoming more common and this trend is likely to continue accelerating for a variety of reasons. First, cheap sensors, actuators, and high-speed wireless internet have drastically lowered the barrier to deploy an autonomous system. Second, autonomy creates the possibility of learning “on device”, keeping experience local and off of any central servers. This makes it easier to comply with privacy requirements (Kairouz et al., 2019) and increases… Expand


Learning with Opponent-Learning Awareness
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Expand
Emergence of Norms through Social Learning
A model that supports the emergence of social norms via learning from interaction experiences is proposed and the key research question is to find out if the entire population learns to converge to a consistent norm. Expand
Counterfactual Multi-Agent Policy Gradients
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. Expand
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning. Expand
Prosocial learning agents solve generalized Stag Hunts better than selfish ones
It is shown that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes, and experimentally shows that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels. Expand
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. Expand
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL. Expand
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
It is argued that exploration in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to the regions of the state space they explore if the agents can coordinate their exploration and maximize extrinsic returns. Expand
Evolving intrinsic motivations for altruistic behavior
It is demonstrated that individual inductive biases for cooperation can be learned in a model-free way by combining MARL with appropriately structured natural selection and an innovative modular architecture for deep reinforcement learning agents which supports multi-level selection. Expand
Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors
It is shown that agents benefit when eating poisonous berries is taboo, meaning the behavior is punished by other agents, as this helps overcome a credit-assignment problem in discovering delayed health effects and improves the rate and stability with which agents learn to punish taboo violations and comply with taboos. Expand