SHOWING 1-10 OF 54 REFERENCES
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance.
Emergence of cooperation and evolutionary stability in finite populations
It is shown that a single cooperator using a strategy like ‘tit-for-tat’ can invade a population of defectors with a probability that corresponds to a net selective advantage.
Evolving learning rules and emergence of cooperation in spatial prisoner's dilemma.
- BiologyJournal of theoretical biology
Zero-determinant strategies under observation errors in repeated games
This work analytically study the strategies that enforce linear payoff relationships in the RPD game considering both a discount factor and observation errors and reveals that the payoffs of two players can be represented by the form of determinants even with the two factors.
Deterministic limit of temporal difference reinforcement learning for stochastic games
- Computer SciencePhysical review. E
This work presents a methodological extension, separating the interaction from the adaptation timescale, to derive the deterministic limit of a general class of reinforcement learning algorithms, called temporal difference learning, which is equipped to function in more realistic multistate environments.
Partners and rivals in direct reciprocity
- BiologyNature Human Behaviour
Hilbe et al. synthesize recent theoretical work on zero-determinant and ‘rival’ versus ‘partner’ strategies in social dilemmas and describe the environments under which these contrasting selfish or cooperative strategies emerge in evolution.
Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent
- PsychologyProceedings of the National Academy of Sciences
It is shown that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards, but such strategies unexpectedly do exist.
Learning with Opponent-Learning Awareness
- Computer ScienceAAMAS
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
Effects of Space in 2 × 2 Games
- EconomicsInt. J. Bifurc. Chaos
It is demonstrated that often spatial extension is indeed capable of promoting cooperative behavior and this holds in particular for the prisoner's dilemma for a small but important parameter range.