A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

@inproceedings{Lin2020ASO,
  title={A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry},
  author={Baihan Lin and Guillermo A. Cecchi and Djallel Bouneffouf and Jenna Reinen and Irina Rish},
  booktitle={AAMAS},
  year={2020}
}
Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic… 
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Etat de l'art sur l'application des bandits multi-bras
TLDR
Un examen complet des principaux développements récents dans de multiples applications réelles des bandits, identifions les tendances actuelles importantes and fournissons de nouvelles perspectives concernant l’avenir de ce domaine en plein essor.
Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers
  • Baihan Lin
  • Computer Science, Mathematics
    Entropy
  • 2021
TLDR
The regularity normalization as an unsupervised attention mechanism which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
AI 2020: Advances in Artificial Intelligence: 33rd Australasian Joint Conference, AI 2020, Canberra, ACT, Australia, November 29–30, 2020, Proceedings
TLDR
This work has proposed multi-diseases classification from chest-X-ray using Federated Deep Learning (FDL), and found out that Momentum SGD yields better results than others.
Computing the Dirichlet-Multinomial Log-Likelihood Function
TLDR
This work uses mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelihood function, which has a lower computational complexity and is much faster without comprimising computational accuracy.
Contextual Bandit with Missing Rewards
TLDR
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
TLDR
This work proposes to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where the full spectrum of reinforcement learning agents were explored: multi-armed bandits, contextual bandits and reinforcement learning.
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
  • Baihan Lin
  • Computer Science, Mathematics
    Australasian Conference on Artificial Intelligence
  • 2020
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different
Online learning with Corrupted context: Corrupted Contextual Bandits
TLDR
This work proposes to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism to address the corrupted-context setting where the context used at each decision may be corrupted ("useless context").
...
1
2
...

References

SHOWING 1-10 OF 44 REFERENCES
Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm
  • Stefan Elfwing, B. Seymour
  • Computer Science
    2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
  • 2017
TLDR
A modified RL scheme involving a new algorithm which is introduced, which back-ups worst-case predictions in parallel, and then scales the two predictions in a multiattribute RL policy, illustrating the importance of independent punishment prediction in RL and providing a testable framework for better understanding punishment in humans.
Data from 617 Healthy Participants Performing the Iowa Gambling Task: A “Many Labs” Collaboration
TLDR
This data pool (N = 617) comes from 10 independent studies assessing performance of healthy participants on the Iowa gambling task (IGT)—a task measuring decision making under uncertainty in an experimental context.
Steve Jensen
  • Adam Johnson, and Zeb Kurth-Nelson. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychological review, 114(3):784
  • 2007
By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism
TLDR
It is shown, using two cognitive procedural learning tasks, that Parkinson's patients off medication are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes.
Introduction to reinforcement learning, volume 135
  • MIT press Cambridge,
  • 1998
On-line Q-learning using connectionist systems
TLDR
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.
Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.
TLDR
The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: a T DRL process that learns the value of situation-action pairs and a situation recognition process that categorizes the observed cues into situations.
Introduction to Reinforcement Learning
TLDR
In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Diabolical Games: Reinforcement Learning Environments for Lifelong Learning
  • In under review
  • 2020
...
1
2
3
4
5
...