• Corpus ID: 12783305

# Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

@article{Zhu2017EffectiveWS,
title={Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention},
author={Feiyun Zhu and Peng Liao},
journal={ArXiv},
year={2017},
volume={abs/1704.04866}
}
• Published 17 April 2017
• Computer Science
• ArXiv
Online reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good performance of the online learning algorithms can be especially detrimental in the mHealth, where users…

## Tables from this paper

• Computer Science
BCB
• 2018
A network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth to explore how to share information among similar users to better convert the limited user information into sharper learned policies.
• Computer Science
MICCAI
• 2018
This paper employs the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group, which can achieve clear gains over the state-of-the-art RL methods for mHealth.
• Computer Science
BCB
• 2018
It is proved that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.
• Computer Science
• 2018
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
• Computer Science
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
• 2022
A novel method, Confidence Constrained Learning for Warm-Start RL, that reduces degradation by balancing between the policy gradient and constrained learning according to a confidence measure of the Q-values, and a novel objective, Positive Q-value Distance (CCL-PQD).
• Engineering
IEEE Transactions on Transportation Electrification
• 2021
The results show that the proposed warm-start Q-learning requires 68.8% fewer iterations than cold-startQ-learning and improves 10%–16% MPG compared with equivalent consumption minimization strategy control.
• Computer Science
ArXiv
• 2017
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
• Psychology
AAAI Workshops
• 2018
The World Health Organisation (WHO) states that: “There is no health without mental health”. Health population stud-ies show that the most common mental disorders are anxi- ety disorders. Nowadays,
• Engineering
ArXiv
• 2020
This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods and can be used to facilitate the deployment of RL in vehicle supervisory control applications.
• Computer Science
AAAI
• 2021
This work presents a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture, and permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates.

## References

SHOWING 1-10 OF 22 REFERENCES

• Computer Science
ArXiv
• 2017
A network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth to explore how to share information among similar users to better convert the limited user information into sharper learned policies.
This dissertation provides an online actor critic algorithm that guides the construction and refinement of a JITAI, and develops and tests asymptotic properties of theActor critic algorithm, including consistency, asymPTotic distribution and regret bound of the optimal JIT AI parameters.
• Computer Science
• 2014
This article formulates the task of tailoring interventions in real-time as a contextual bandit problem differently from existing formulations intended for web applications such as ad or news article placement, and provides an online actor-critic algorithm that guides the construction and refinement of a JITAI.
• Computer Science
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
• 2012
The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given.
• Psychology
PloS one
• 2013
The results indicated that the three daily activity smartphone applications were sufficiently robust to significantly improve regular moderate-to-vigorous intensity physical activity and decrease leisure-time sitting during the 8-week behavioral adoption period.
• Computer Science
J. Mach. Learn. Res.
• 2003
The new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework.
• Computer Science
IEEE Transactions on Neural Networks
• 2005
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
• Medicine, Psychology
Journal of medical Internet research
• 2009
Text messages might prove to be a productive channel of communication to promote behaviors that support weight loss in overweight adults.
• Psychology, Medicine
Journal of consulting and clinical psychology
• 2012
The aftercare intervention was efficacious in enhancing treatment outcome after discharge from inpatient treatment and its impact on the utilization of outpatient treatment during follow-up was investigated.
• Computer Science
IEEE Transactions on Neural Networks and Learning Systems
• 2013
This survey reviews state-of-the-art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual, and projected fixed-point approaches.