From Ads to Interventions: Contextual Bandits in Mobile Health

@inproceedings{Tewari2017FromAT,
  title={From Ads to Interventions: Contextual Bandits in Mobile Health},
  author={Ambuj Tewari and Susan A. Murphy},
  booktitle={Mobile Health - Sensors, Analytic Methods, and Applications},
  year={2017}
}
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of the American Statistical Association, 74(368), 799–806, 1979) but the term “contextual bandits” was invented only recently in 2008 by Langford and Zhang (Advances in neural information processing systems, pages 817–824, 2008). Woodroofe’s motivating application was clinical trials whereas modern interest in this problem was driven to a great extent by problems on the internet, such as online ad and online… 
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
TLDR
It is proved that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.
Post-Contextual-Bandit Inference
TLDR
The Contextual Adaptive Doubly Robust (CADR) estimator is proposed, a novel estimator for policy value that is asymptotically normal under contextual adaptive data collection and extensive numerical experiments demonstrate that confidence intervals based on CADR uniquely provide correct coverage.
Robust Contextual Bandit via the Capped- ℓ _2 ℓ 2 Norm for Mobile Health Intervention
TLDR
This paper is the first to propose a novel robust actor-critic contextual bandit method for the mHealth intervention that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
contextual: Evaluating Contextual Multi-Armed Bandit Problems in R
TLDR
A user-friendly and, through its object-oriented structure, easily extensible framework that facilitates parallelized comparison of contextual and context-free bandit policies through both simulation and offline analysis is introduced.
Robust Contextual Bandit via the Capped-` 2 norm
This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decisionmaking methods in mHealth generally assume that the noise in the
Robust Contextual Bandit via the Capped-$\ell_{2}$ norm
TLDR
This paper proposes a novel robust actor-critic contextual bandit method that can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
TLDR
The first development of a treatment recommender system for emotion regulation using real-world historical mobile digital data from n = 114 high socially anxious participants to test the usefulness of new emotion regulation strategies is presented.
Personalized Policy Learning Using Longitudinal Mobile Health Data
TLDR
Aiming to optimize immediate rewards, this work proposes using a generalized linear mixed modeling framework where population features and individual features are modeled as fixed and random effects, respectively, and synthesized to form the personalized policy.
Action Centered Contextual Bandits
TLDR
This work provides an extension of the linear model for contextual bandits that has two parts: baseline reward and treatment effect that is plausible for mobile health applications and leads to algorithms with strong performance guarantees as in thelinear model setting, while still allowing for complex nonlinear baseline modeling.
Mobile Health
TLDR
Six chapters that demonstrate the novel utility of mHealth, present design lessons in developing mHealth applications, and describe tools for managing mHealth data collection studies are described.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 73 REFERENCES
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.
TLDR
This dissertation provides an online actor critic algorithm that guides the construction and refinement of a JITAI, and develops and tests asymptotic properties of theActor critic algorithm, including consistency, asymPTotic distribution and regret bound of the optimal JIT AI parameters.
Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support
TLDR
It is critical that researchers develop sophisticated and nuanced health behavior theories capable of guiding the construction of JITAIs and particular attention has to be given to better understanding the implications of providing timely and ecologically sound support for intervention adherence and retention.
An unbiased offline evaluation of contextual bandit algorithms with generalized linear models
TLDR
This paper argues for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems, and compares and validate a number of new algorithms based on generalized linear models.
Online Decision-Making with High-Dimensional Covariates
TLDR
This work forms this problem as a multi-armed bandit with high-dimensional covariates, and presents a new efficient bandit algorithm based on the LASSO estimator that outperforms existing bandit methods as well as physicians to correctly dose a majority of patients.
A contextual-bandit approach to personalized news article recommendation
TLDR
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
Contextual Multi-Armed Bandits
TLDR
A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces.
Dynamic influences on smoking relapse process.
  • S. Shiffman
  • Psychology, Medicine
    Journal of personality
  • 2005
TLDR
A program of research applying Ecological Momentary Assessment methods to study relapse to cigarette smoking, with a particular focus on the role of negative affect (NA) and self-efficacy (SE), highlights the importance of dynamic changes in background conditions and in immediate states as important influences on lapses and relapse.
Doubly Robust Policy Evaluation and Learning
TLDR
It is proved that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies, and is expected to become common practice.
Thompson Sampling for Contextual Bandits with Linear Payoffs
TLDR
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the
...
1
2
3
4
5
...