A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

@inproceedings{Lin2020ASO,
  title={A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry},
  author={Baihan Lin and Guillermo A. Cecchi and Djallel Bouneffouf and Jenna Reinen and Irina Rish},
  booktitle={AAMAS},
  year={2020}
}
Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic… 
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Predicting human decision making in psychological tasks with recurrent neural networks
TLDR
A recurrent neural network architecture based on long short-term memory networks (LSTM) is proposed to be used to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain.
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
  • Baihan Lin
  • Computer Science, Psychology
    Australasian Conference on Artificial Intelligence
  • 2020
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different
Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
TLDR
This work proposes to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where the full spectrum of reinforcement learning agents were explored: multi-armed bandits, contextual bandits and reinforcement learning.
Contextual Bandit with Missing Rewards
TLDR
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers
TLDR
The regularity normalization as an unsupervised attention mechanism which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution.
Deep Annotation of Therapeutic Working Alliance in Psychotherapy
The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring ques-tionnaires in an
Neural Topic Modeling of Psychotherapy Sessions
TLDR
This work compares different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings and believes this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the Psychotherapy effectiveness.
Online learning with Corrupted context: Corrupted Contextual Bandits
TLDR
This work proposes to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism to address the corrupted-context setting where the context used at each decision may be corrupted ("useless context").
...
...

References

SHOWING 1-10 OF 43 REFERENCES
Split Q Learning: Reinforcement Learning with Two-Stream Rewards
TLDR
A general parametric framework is proposed, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
TLDR
A general parametric framework for multi-armed bandit problem is proposed, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism
TLDR
It is shown, using two cognitive procedural learning tasks, that Parkinson's patients off medication are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes.
Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.
TLDR
The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: a T DRL process that learns the value of situation-action pairs and a situation recognition process that categorizes the observed cues into situations.
Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm
  • Stefan Elfwing, B. Seymour
  • Computer Science
    2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
  • 2017
TLDR
A modified RL scheme involving a new algorithm which is introduced, which back-ups worst-case predictions in parallel, and then scales the two predictions in a multiattribute RL policy, illustrating the importance of independent punishment prediction in RL and providing a testable framework for better understanding punishment in humans.
Cognitive Mechanisms Underlying Risky Decision-Making in Chronic Cannabis Users.
Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term
TLDR
This work uses fast-scan cyclic voltammetry to measure reward-evoked dopamine release at carbon fiber electrodes chronically implanted in the nucleus accumbens core of rats trained on a probabilistic decision-making task and demonstrates that dopamine concentrations transmit a bidirectional RPE signal with symmetrical encoding of positive and negative RPEs.
Reward processing in neurodegenerative disease
TLDR
This review presents the existing evidence of reward processing changes in neurodegenerative diseases including mild cognitive impairment (MCI), Alzheimer's disease, frontotemporal dementia, amyotrophic lateral sclerosis (ALS), Parkinson’s disease, and Huntington’S disease, as well as in healthy aging.
...
...