• Corpus ID: 240354494

Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

@article{Deliu2021EfficientIW,
  title={Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling},
  author={Nina Deliu and Joseph Jay Williams and Sof{\'i}a S. Villar},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.00137}
}
Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference (e.g., biased estimators, inflated type-I error and reduced power). Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit algorithm–trading off regret–and require large sample sizes to ensure asymptotic guarantees. However, large experiments generally follow a successful pilot study, which is… 

Response-adaptive randomization in clinical trials: from myths to practical considerations

TLDR
This work aims to address a persistent gap in understanding of response-adaptive randomization by providing a critical, balanced and updated review of methodological and practical issues to consider when debating the use of RAR in clinical trials.

Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions

TLDR
This work provides the first unified instructive survey on RL methods for building AIs, encompassing both dynamic treatment regimes (DTRs) and just-in-time adaptive interventions in mobile health (mHealth).

Multi-disciplinary fairness considerations in machine learning for clinical trials

TLDR
This work examines potential sources of unfairness in clinical trials, providing concrete examples, and discusses the role machine learning might play in either mitigating potential biases or exacerbating them when applied without care.

References

SHOWING 1-2 OF 2 REFERENCES

Further Optimal Regret Bounds for Thompson Sampling

TLDR
A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

TLDR
For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.