• Corpus ID: 219573558

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

@article{Lin2020OnlineLI,
  title={Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior},
  author={Baihan Lin and Djallel Bouneffouf and Guillermo A. Cecchi},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.06580}
}
Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics… 
Distinct Signaling Pathways Mediate Touch and Osmosensory Responses in a Polymodal Sensory Neuron
TLDR
It is shown that distinct signaling pathways mediate the responses to touch and hyperosmolarity in Caenorhabditis elegans ASH and that OSM-10 is required for osmosensory signaling.
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
TLDR
This work proposes the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations and introduces EvoBandit, a web-based interactive visualization to guide the readers through the entire learning process and perform lightweight evaluations on the results.
Towards Circular and Asymmetric Cooperation in a Multi-player Graph-based Iterated Prisoner's Dilemma
TLDR
A Graph-based Iterated Prisoner's Dilemma is introduced: a N-player game in which the possible cooperation between players is modeled by a weighted directed graph, and a graph-based TFT algorithm is proposed that allows it to spread favor better collaboration synergies in most situations.
Predicting human decision making in psychological tasks with recurrent neural networks
Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions, and theory of mind, i.e., what others are
Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox
TLDR
A new benchmark to evaluate the rarely studied fully online speaker diarization problem is proposed and a workable web-based recognition system which interactively handles the cold start problem of new user’s addition by transferring representations of old arms to new ones with an extendable contextual bandit is provided.
Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget
TLDR
This work forms this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function and proves this concept with simulations of multiple realistic policy making scenarios.
Etat de l'art sur l'application des bandits multi-bras
TLDR
Un examen complet des principaux développements récents dans de multiples applications réelles des bandits, identifions les tendances actuelles importantes and fournissons de nouvelles perspectives concernant l’avenir de ce domaine en plein essor.
Predicting human decision making in psychological tasks with recurrent neural networks
TLDR
A recurrent neural network architecture based on long short-term memory networks (LSTM) is proposed to be used to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Using a utility computing framework to develop utility systems
TLDR
A utility computing framework, consisting of a component model, a methodology, and a set of tools and common services for building utility computing systems, is described.
Level crossing rates and MIMO capacity fades: impacts of spatial/temporal channel correlation
TLDR
This paper investigates the behaviour of "capacity fades", examines how often the capacity experiences the fades, develops a method to determine level crossing rates and average data durations and relate these to antenna numbers, and compares the channel capacity under independent fading.
Event-related synchronization (ERS) in the alpha band--an electrophysiological correlate of cortical idling: a review.
Flash: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes
TLDR
The first version of a new-generation simulation code, FLASH, solves the fully compressible, reactive hydrodynamic equations and allows for the use of adaptive mesh refinement and contains state-of-the-art modules for the equations of state and thermonuclear reaction networks.
Randomized Ablation Strategies for the Treatment of Persistent Atrial Fibrillation: RASTA Study
TLDR
The data suggest that additional substrate modification beyond PVI does not improve single-procedure efficacy in patients with persistent atrial fibrillation.
Effective Choice in the Prisoner ' s Dilemma
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Introduction to reinforcement learning, volume 135
  • 1998
Reward processing in neurodegenerative disease
TLDR
This review presents the existing evidence of reward processing changes in neurodegenerative diseases including mild cognitive impairment (MCI), Alzheimer's disease, frontotemporal dementia, amyotrophic lateral sclerosis (ALS), Parkinson’s disease, and Huntington’S disease, as well as in healthy aging.
...
...