Split Q Learning: Reinforcement Learning with Two-Stream Rewards

@inproceedings{Lin2019SplitQL,
  title={Split Q Learning: Reinforcement Learning with Two-Stream Rewards},
  author={Baihan Lin and Djallel Bouneffouf and Guillermo A. Cecchi},
  booktitle={IJCAI},
  year={2019}
}
Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community… 
Contextual Bandit with Missing Rewards
TLDR
Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, this work is able to learn from each incoming event, even those with missing rewards.
Predicting human decision making in psychological tasks with recurrent neural networks
TLDR
A recurrent neural network architecture based on long short-term memory networks (LSTM) is proposed to be used to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain.
Regularity Normalization: Neuroscience-Inspired Unsupervised Attention across Neural Network Layers
  • Baihan Lin
  • Computer Science, Mathematics
    Entropy
  • 2021
TLDR
The regularity normalization as an unsupervised attention mechanism which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution.
How to Guide Humans Towards Skills Improvement in Physical Human-Robot Collaboration Using Reinforcement Learning?
TLDR
This work proposes a new hybrid approach that combines reinforcement learning and a symbolic approach based on an ontology to guide humans towards skills improvement using solely internal robot data without any additional sensor.
AI 2020: Advances in Artificial Intelligence: 33rd Australasian Joint Conference, AI 2020, Canberra, ACT, Australia, November 29–30, 2020, Proceedings
TLDR
This work has proposed multi-diseases classification from chest-X-ray using Federated Deep Learning (FDL), and found out that Momentum SGD yields better results than others.
An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning.
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL
TLDR
Inspired by the known reward processing abnormalities of many mental disorders, clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting.
A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
TLDR
The proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.
Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
TLDR
This work proposes to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where the full spectrum of reinforcement learning agents were explored: multi-armed bandits, contextual bandits and reinforcement learning.
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
  • Baihan Lin
  • Computer Science, Mathematics
    Australasian Conference on Artificial Intelligence
  • 2020
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different

References

SHOWING 1-6 OF 6 REFERENCES
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
TLDR
A general parametric framework for multi-armed bandit problem is proposed, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.
TLDR
The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: a T DRL process that learns the value of situation-action pairs and a situation recognition process that categorizes the observed cues into situations.
By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism
TLDR
It is shown, using two cognitive procedural learning tasks, that Parkinson's patients off medication are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes.
Reward processing in neurodegenerative disease
TLDR
This review presents the existing evidence of reward processing changes in neurodegenerative diseases including mild cognitive impairment (MCI), Alzheimer's disease, frontotemporal dementia, amyotrophic lateral sclerosis (ALS), Parkinson’s disease, and Huntington’S disease, as well as in healthy aging.
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
TLDR
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.