Corpus ID: 218517042

DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret

@article{Hu2020DTRBL,
  title={DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret},
  author={Yichun Hu and Nathan Kallus},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.02791}
}
Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions both to an individual's initial features and to intermediate outcomes and features at each subsequent stage, which are affected by decisions in prior stages. Examples include personalized first- and second-line treatments of chronic conditions like diabetes, cancer, and depression, which adapt to patient response to first-line treatment, disease progression, and individual… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 72 REFERENCES
Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens.
New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes
Q-learning for estimating optimal dynamic treatment rules from observational data.
Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.
  • S. Villar, J. Bowden, J. Wason
  • Computer Science, Mathematics
  • Statistical science : a review journal of the Institute of Mathematical Statistics
  • 2015
Reinforcement learning design for cancer clinical trials.
Q-Learning: Flexible Learning About Useful Utilities
Balanced Policy Evaluation and Learning
Online Decision-Making with High-Dimensional Covariates
A Linear Response Bandit Problem
...
1
2
3
4
5
...