Selfish optimization and collective learning in populations

  title={Selfish optimization and collective learning in populations},
  author={Alex McAvoy and Yoichiro Mori and Joshua B. Plotkin},
  journal={Physica D: Nonlinear Phenomena},

Figures from this paper



Multi-agent Reinforcement Learning in Sequential Social Dilemmas

This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance.

Emergence of cooperation and evolutionary stability in finite populations

It is shown that a single cooperator using a strategy like ‘tit-for-tat’ can invade a population of defectors with a probability that corresponds to a net selective advantage.

Zero-determinant strategies under observation errors in repeated games

This work analytically study the strategies that enforce linear payoff relationships in the RPD game considering both a discount factor and observation errors and reveals that the payoffs of two players can be represented by the form of determinants even with the two factors.

Deterministic limit of temporal difference reinforcement learning for stochastic games

This work presents a methodological extension, separating the interaction from the adaptation timescale, to derive the deterministic limit of a general class of reinforcement learning algorithms, called temporal difference learning, which is equipped to function in more realistic multistate environments.

Partners and rivals in direct reciprocity

Hilbe et al. synthesize recent theoretical work on zero-determinant and ‘rival’ versus ‘partner’ strategies in social dilemmas and describe the environments under which these contrasting selfish or cooperative strategies emerge in evolution.

Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent

It is shown that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards, but such strategies unexpectedly do exist.

Learning with Opponent-Learning Awareness

Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.

Effects of Space in 2 × 2 Games

It is demonstrated that often spatial extension is indeed capable of promoting cooperative behavior and this holds in particular for the prisoner's dilemma for a small but important parameter range.

Repeated games with one-memory