Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
@article{Lin2020OnlineLI, title={Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior}, author={Baihan Lin and Djallel Bouneffouf and Guillermo A. Cecchi}, journal={ArXiv}, year={2020}, volume={abs/2006.06580} }
Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics…Â
Figures and Tables from this paper
16 Citations
Distinct Signaling Pathways Mediate Touch and Osmosensory Responses in a Polymodal Sensory Neuron
- BiologyThe Journal of Neuroscience
- 1999
It is shown that distinct signaling pathways mediate the responses to touch and hyperosmolarity in Caenorhabditis elegans ASH and that OSM-10 is required for osmosensory signaling.
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
- Psychology
- 2020
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
- Computer ScienceArXiv
- 2022
This work proposes the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations and introduces EvoBandit, a web-based interactive visualization to guide the readers through the entire learning process and perform lightweight evaluations on the results.
Towards Circular and Asymmetric Cooperation in a Multi-player Graph-based Iterated Prisoner's Dilemma
- Computer ScienceICAART
- 2022
A Graph-based Iterated Prisoner's Dilemma is introduced: a N-player game in which the possible cooperation between players is modeled by a weighted directed graph, and a graph-based TFT algorithm is proposed that allows it to spread favor better collaboration synergies in most situations.
Predicting human decision making in psychological tasks with recurrent neural networks
- PsychologyPloS one
- 2022
Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions, and theory of mind, i.e., what others are…
Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox
- Computer Science
- 2021
A new benchmark to evaluate the rarely studied fully online speaker diarization problem is proposed and a workable web-based recognition system which interactively handles the cold start problem of new user’s addition by transferring representations of old arms to new ones with an extendable contextual bandit is provided.
Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget
- Computer ScienceArXiv
- 2021
This work forms this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function and proves this concept with simulations of multiple realistic policy making scenarios.
Etat de l'art sur l'application des bandits multi-bras
- Computer ScienceArXiv
- 2021
Un examen complet des principaux développements récents dans de multiples applications réelles des bandits, identifions les tendances actuelles importantes and fournissons de nouvelles perspectives concernant l’avenir de ce domaine en plein essor.
Predicting human decision making in psychological tasks with recurrent neural networks
- Computer Science, PsychologybioRxiv
- 2021
A recurrent neural network architecture based on long short-term memory networks (LSTM) is proposed to be used to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
- PsychologyHuman Brain and Artificial Intelligence
- 2021
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
References
SHOWING 1-10 OF 70 REFERENCES
Using a utility computing framework to develop utility systems
- Computer Science, EconomicsIBM Syst. J.
- 2004
A utility computing framework, consisting of a component model, a methodology, and a set of tools and common services for building utility computing systems, is described.
Level crossing rates and MIMO capacity fades: impacts of spatial/temporal channel correlation
- BusinessIEEE International Conference on Communications, 2003. ICC '03.
- 2003
This paper investigates the behaviour of "capacity fades", examines how often the capacity experiences the fades, develops a method to determine level crossing rates and average data durations and relate these to antenna numbers, and compares the channel capacity under independent fading.
Event-related synchronization (ERS) in the alpha band--an electrophysiological correlate of cortical idling: a review.
- BiologyInternational journal of psychophysiology : official journal of the International Organization of Psychophysiology
- 1996
Flash: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes
- Physics
- 2000
The first version of a new-generation simulation code, FLASH, solves the fully compressible, reactive hydrodynamic equations and allows for the use of adaptive mesh refinement and contains state-of-the-art modules for the equations of state and thermonuclear reaction networks.
Randomized Ablation Strategies for the Treatment of Persistent Atrial Fibrillation: RASTA Study
- MedicineCirculation. Arrhythmia and electrophysiology
- 2012
The data suggest that additional substrate modification beyond PVI does not improve single-procedure efficacy in patients with persistent atrial fibrillation.
Finite-time Analysis of the Multiarmed Bandit Problem
- Computer ScienceMachine Learning
- 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Introduction to reinforcement learning, volume 135
- 1998
Reward processing in neurodegenerative disease
- Psychology, BiologyNeurocase
- 2015
This review presents the existing evidence of reward processing changes in neurodegenerative diseases including mild cognitive impairment (MCI), Alzheimer's disease, frontotemporal dementia, amyotrophic lateral sclerosis (ALS), Parkinson’s disease, and Huntington’S disease, as well as in healthy aging.