# Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Online reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good performance of the online learning algorithms can be especially detrimental in the mHealth, where users…

• Computer Science
BCB
• 2018
A network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth to explore how to share information among similar users to better convert the limited user information into sharper learned policies.
• Computer Science
MICCAI
• 2018
This paper employs the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group, which can achieve clear gains over the state-of-the-art RL methods for mHealth.
• Computer Science
BCB
• 2018
It is proved that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.
• Computer Science
• 2018
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
• Computer Science
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
• 2022
A novel method, Confidence Constrained Learning for Warm-Start RL, that reduces degradation by balancing between the policy gradient and constrained learning according to a confidence measure of the Q-values, and a novel objective, Positive Q-value Distance (CCL-PQD).
• Engineering
IEEE Transactions on Transportation Electrification
• 2021
The results show that the proposed warm-start Q-learning requires 68.8% fewer iterations than cold-startQ-learning and improves 10%–16% MPG compared with equivalent consumption minimization strategy control.
• Computer Science
ArXiv
• 2017
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
• Psychology
AAAI Workshops
• 2018
The World Health Organisation (WHO) states that: “There is no health without mental health”. Health population stud-ies show that the most common mental disorders are anxi- ety disorders. Nowadays,
• Engineering
ArXiv
• 2020
This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods and can be used to facilitate the deployment of RL in vehicle supervisory control applications.
• Computer Science
AAAI
• 2021
This work presents a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture, and permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates.

