• Corpus ID: 219573558

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

  title={Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior},
  author={Baihan Lin and Djallel Bouneffouf and Guillermo A. Cecchi},
Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics… 
Contextual Bandit with Missing Rewards
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Online learning with Corrupted context: Corrupted Contextual Bandits
This work proposes to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism to address the corrupted-context setting where the context used at each decision may be corrupted ("useless context").
Computing the Dirichlet-Multinomial Log-Likelihood Function
This work uses mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelihood function, which has a lower computational complexity and is much faster without comprimising computational accuracy.
Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling
A scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it, which yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods.
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Towards Circular and Asymmetric Cooperation in a Multi-player Graph-based Iterated Prisoner's Dilemma
A Graph-based Iterated Prisoner's Dilemma is introduced: a N-player game in which the possible cooperation between players is modeled by a weighted directed graph, and a graph-based TFT algorithm is proposed that allows it to spread favor better collaboration synergies in most situations.
Etat de l'art sur l'application des bandits multi-bras
Un examen complet des principaux développements récents dans de multiples applications réelles des bandits, identifions les tendances actuelles importantes and fournissons de nouvelles perspectives concernant l’avenir de ce domaine en plein essor.
Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget
This work forms this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function and proves this concept with simulations of multiple realistic policy making scenarios.
Predicting human decision making in psychological tasks with recurrent neural networks
A recurrent neural network architecture based on long short-term memory networks (LSTM) is proposed to be used to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain.
Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox
A new benchmark to evaluate the rarely studied fully online speaker diarization problem is proposed and a workable web-based recognition system which interactively handles the cold start problem of new user’s addition by transferring representations of old arms to new ones with an extendable contextual bandit is provided.


Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach
This work proposes a deep multiagent reinforcement learning approach that investigates the evolution of mutual cooperation in SPD games and shows that this strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
This work analyzes the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network on two Markov games and characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance.
Predicting Human Cooperation
This work presents the first computational model of human behavior in repeated Prisoner's Dilemma games that unifies the diversity of experimental observations in a systematic and quantitatively reliable manner and demonstrates the power of the approach through a simulation analysis revealing how to best promote human cooperation.
Active Player Modeling in the Iterated Prisoner's Dilemma
This paper proposes an active modeling technique to predict the behavior of IPD players and shows that the observer was able to build a more accurate model of an opponent's behavior than when the data were collected through random actions.
Finding Best Answers for the Iterated Prisoner’s Dilemma Using Improved Q-Learning
This article presents and discusses several improvements to the Q-Learning algorithm, allowing for an easy numerical measure of the exploitability of a given strategy.
Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent
  • W. Press, F. Dyson
  • Psychology
    Proceedings of the National Academy of Sciences
  • 2012
It is shown that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards, but such strategies unexpectedly do exist.
Bayesian analysis of deterministic and stochastic prisoner's dilemma games
This paper compares the behavior of individuals playing a classic two-person deterministic prisoner’s dilemma (PD) game with choice data obtained from repeated interdependent security prisoner’s
Rational Cooperation in the Finitely Repeated Prisoner's Dilemma: Experimental Evidence
This paper presents experiments designed to examine the sequential equilibrium reputation hypothesis in the finitely repeated prisoner's dilemma. The authors test the hypothesis by controlling the
Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma
We present tournament results and several powerful strategies for the Iterated Prisoner’s Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These
Effects of Tryptophan Depletion on the Performance of an Iterated Prisoner's Dilemma Game in Healthy Adults
Analysis of performance of an iterated, sequential PD game for monetary reward by healthy human adult participants following ingestion of an amino-acid drink that either did (T+) or did not (T−) contain l-tryptophan suggests that serotonin plays a significant role in the acquisition of socially cooperative behavior in humanAdult participants.