• Corpus ID: 232147161

Deep reinforcement learning models the emergent dynamics of human cooperation

  title={Deep reinforcement learning models the emergent dynamics of human cooperation},
  author={Kevin R. McKee and Edward Hughes and Tina Zhu and Martin J. Chadwick and R. Koster and Antonio Garc{\'i}a Casta{\~n}eda and Charlie Beattie and Thore Graepel and Matthew M. Botvinick and Joel Z. Leibo},
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive mechanisms contribute to the where and when of collective action. We leverage multi-agent deep… 

Figures from this paper

Social learning spontaneously emerges by searching optimal heuristics with deep reinforcement learning
A deep reinforcement learning model is employed to optimize the social learning strategies (SLSs) of agents in a cooperative game in a multi-dimensional landscape and demonstrates the superior performance of the reinforcement learning agent in various environments, including temporally changing environments and real social networks.
Learning Robust Real-Time Cultural Transmission without Human Data
Figure 1 | Freeze-frames from a single episode of test-time evaluation, in chronological order from left to right. (a) Our cultural transmission agent (blue avatar) is spawned in a held-out task; (b)
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents
Here, multiagent reinforcement learning is used to investigate the learning dynamics of enforcement and compliance behaviors, and it is demonstrated that normative behavior relies on a sequence of learned skills.
Collaborating with Humans without Human Data
This work studies the problem of how to train agents that collaborate well with human partners without using human data, and argues that the crux of the problem is to produce a diverse set of training partners.
Cryptographic Hardness under Projections for Time-Bounded Kolmogorov Complexity
It is improved to show that MKTP is hard for the (apparently larger) class NISZK L under not only ≤ NC 0 m reductions but even under projections, and to obtain a new lower bound on MKTP.


Nice Guys Finish First: The Competitive Altruism Hypothesis
The results of three experimental studies support the premise at the heart of competitive altruism: Individuals may behave altruistically for reputation reasons because selective benefits (associated with status) accrue to the generous.
Cooperation through image scoring in humans.
It is shown that image scoring promotes cooperative behavior in situations where direct reciprocity is unlikely, unless indirect reciprocity takes place and is based on image scoring, as recently shown by game theorists.
The efficient interaction of indirect reciprocity and costly punishment
Advances in experimental economics and evolutionary biology are combined to show that costly punishment and reputation formation, respectively, induce cooperation in social dilemmas and that punishment is maintained when a combination with reputation building is available, however, at a low level.
Reputation helps solve the ‘tragedy of the commons’
It is shown, through alternating rounds of public goods and indirect reciprocity games, that the need to maintain reputation for indirect reciprocities maintains contributions to the public good at an unexpectedly high level, but if rounds of indirect reciprocation are not expected, then contributions toThe public good drop quickly to zero.
Dissecting components of reward: 'liking', 'wanting', and learning.
Processing of Social and Monetary Rewards in the Human Striatum
Intrinsically Motivated Reinforcement Learning
Initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy are presented.
Long Short-Term Memory
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning
It is demonstrated that heterogeneity in SVO generates meaningful and complex behavioral variation among agents similar to that suggested by interdependence theory, which suggests agents trained in heterogeneous populations develop particularly generalized, high-performing policies relative to those trained in homogeneous populations.
Learning Reciprocity in Complex Sequential Social Dilemmas
This work presents a general online reinforcement learning algorithm that displays reciprocal behavior towards its co-players and shows that it can induce pro-social outcomes for the wider group when learning alongside selfish agents, both in a $2$-player Markov game, and in intertemporal social dilemmas.