Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention
@article{Zhu2017EffectiveWS, title={Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention}, author={Feiyun Zhu and Peng Liao}, journal={ArXiv}, year={2017}, volume={abs/1704.04866} }
Online reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good performance of the online learning algorithms can be especially detrimental in the mHealth, where users…
Tables from this paper
15 Citations
Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention
- Computer ScienceBCB
- 2018
A network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth to explore how to share information among similar users to better convert the limited user information into sharper learned policies.
Group-driven Reinforcement Learning for Personalized mHealth Intervention
- Computer ScienceMICCAI
- 2018
This paper employs the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group, which can achieve clear gains over the state-of-the-art RL methods for mHealth.
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
- Computer ScienceBCB
- 2018
It is proved that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.
Robust Contextual Bandit via the Capped-` 2 norm
- Computer Science
- 2018
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
Analyzing and Overcoming Degradation in Warm-Start Reinforcement Learning
- Computer Science2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2022
A novel method, Confidence Constrained Learning for Warm-Start RL, that reduces degradation by balancing between the policy gradient and constrained learning according to a confidence measure of the Q-values, and a novel objective, Positive Q-value Distance (CCL-PQD).
Learning Time Reduction Using Warm-Start Methods for a Reinforcement Learning-Based Supervisory Control in Hybrid Electric Vehicle Applications
- EngineeringIEEE Transactions on Transportation Electrification
- 2021
The results show that the proposed warm-start Q-learning requires 68.8% fewer iterations than cold-startQ-learning and improves 10%–16% MPG compared with equivalent consumption minimization strategy control.
Robust Contextual Bandit via the Capped-$\ell_{2}$ norm
- Computer ScienceArXiv
- 2017
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
Personalisation of Exercises in VRET
- PsychologyAAAI Workshops
- 2018
The World Health Organisation (WHO) states that: “There is no health without mental health”. Health population stud-ies show that the most common mental disorders are anxi- ety disorders. Nowadays,…
Learning Time Reduction Using Warm Start Methods for a Reinforcement Learning Based Supervisory Control in Hybrid Electric Vehicle Applications
- EngineeringArXiv
- 2020
This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods and can be used to facilitate the deployment of RL in vehicle supervisory control applications.
Encoding Human Domain Knowledge to Warm Start Reinforcement Learning
- Computer ScienceAAAI
- 2021
This work presents a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture, and permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates.
References
SHOWING 1-10 OF 22 REFERENCES
Cohesion-based Online Actor-Critic Reinforcement Learning for mHealth Intervention
- Computer ScienceArXiv
- 2017
A network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth to explore how to share information among similar users to better convert the limited user information into sharper learned policies.
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.
- Computer Science
- 2016
This dissertation provides an online actor critic algorithm that guides the construction and refinement of a JITAI, and develops and tests asymptotic properties of theActor critic algorithm, including consistency, asymPTotic distribution and regret bound of the optimal JIT AI parameters.
An Actor-Critic Contextual Bandit Algorithm for Personalized Interventions using Mobile Devices
- Computer Science
- 2014
This article formulates the task of tailoring interventions in real-time as a contextual bandit problem differently from existing formulations intended for web applications such as ad or news article placement, and provides an online actor-critic algorithm that guides the construction and refinement of a JITAI.
A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
- 2012
The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given.
Harnessing Different Motivational Frames via Mobile Phones to Promote Daily Physical Activity and Reduce Sedentary Behavior in Aging Adults
- PsychologyPloS one
- 2013
The results indicated that the three daily activity smartphone applications were sufficiently robust to significantly improve regular moderate-to-vigorous intensity physical activity and decrease leisure-time sitting during the 8-week behavioral adoption period.
Least-Squares Policy Iteration
- Computer ScienceJ. Mach. Learn. Res.
- 2003
The new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework.
Reinforcement Learning: An Introduction
- Computer ScienceIEEE Transactions on Neural Networks
- 2005
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
A Text Message–Based Intervention for Weight Loss: Randomized Controlled Trial
- Medicine, PsychologyJournal of medical Internet research
- 2009
Text messages might prove to be a productive channel of communication to promote behaviors that support weight loss in overweight adults.
Technology-enhanced maintenance of treatment gains in eating disorders: efficacy of an intervention delivered via text messaging.
- Psychology, MedicineJournal of consulting and clinical psychology
- 2012
The aftercare intervention was efficacious in enhancing treatment outcome after discharge from inpatient treatment and its impact on the utilization of outpatient treatment during follow-up was investigated.
Algorithmic Survey of Parametric Value Function Approximation
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2013
This survey reviews state-of-the-art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual, and projected fixed-point approaches.