# Batched Bandit Problems

@inproceedings{Perchet2015BatchedBP,
title={Batched Bandit Problems},
author={Vianney Perchet and Philippe Rigollet and Sylvain Chassang and Erik Snowberg},
booktitle={COLT},
year={2015}
}
• Published in COLT 2 May 2015
• Computer Science
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
114 Citations

## Figures from this paper

Batched Thompson Sampling for Multi-Armed Bandits
• Computer Science
ArXiv
• 2021
This work analyzes Thompson Sampling algorithms for stochastic multiarmed bandits in the batched setting and obtains almost tight regret-batches tradeoffs for the two-arm case.
Batched Neural Bandits
• Computer Science
ArXiv
• 2021
This work proposes the BatchNeuralUCB algorithm which combines neural networks with optimism to address the exploration-exploitation tradeoff while keeping the total number of batches limited and proves that it achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
The Impact of Batch Learning in Stochastic Bandits
• Computer Science
ArXiv
• 2021
This work provides a policy-agnostic regret analysis and demonstrates upper and lower bounds for the regret of a candidate policy, and shows that the impact of batch learning can be measured in terms of online behavior.
Batched Multi-armed Bandits Problem
• Computer Science
NeurIPS
• 2019
The BaSE (batched successive elimination) policy is proposed to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner.
The Impact of Batch Learning in Stochastic Linear Bandits
• Computer Science
• 2022
This work provides a policyagnostic regret analysis and demonstrates upper and lower bounds for the regret of a candidate policy and provides a more robust result for the 2-armed bandit problem as an important insight.
Invariant description of UCB strategy for multi-armed bandits for batch processing scenario
• S. Garbar
• Computer Science
2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC)
• 2020
In this work, a set of Monte-Carlo simulations are performed for different horizon sizes, parameters of the strategy and batch sizes to determine the maximum regret for two-armed bandits.
A Sharp Memory-Regret Trade-Off for Multi-Pass Streaming Bandits
• Computer Science
ArXiv
• 2022
The main technical contribution is the lower bound which requires the use of information-theoretic techniques as well as ideas from round elimination to show that the residual problem remains challenging over subsequent passes.
Anytime optimal algorithms in stochastic multi-armed bandits
• Computer Science
ICML
• 2016
We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this
Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
• Computer Science
NIPS
• 2017
The Upper-Confidence Frank-Wolfe algorithm is analyzed, inspired by techniques for bandits and convex optimization, and theoretical guarantees for the performance of this algorithm over various classes of functions are given.
Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits
• Computer Science
ICML
• 2021
An anytime algorithm is proposed that achieves the asymptotically optimal regret for exponential families of reward distributions with O(log log T · ilog(T ))1 batches, where α ∈ OT (1).

## References

SHOWING 1-10 OF 62 REFERENCES
The multi-armed bandit problem with covariates
• Computer Science
ArXiv
• 2011
This work introduces a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably “localized” static bandit problems and introduces a nonparametric model where the expected rewards are smooth functions of the covariate and the hardness of the problem is captured by a margin parameter.
Bounded regret in stochastic multi-armed bandits
• Computer Science, Mathematics
COLT
• 2013
A new randomized policy is proposed that attains a regret {\em uniformly bounded over time} in this setting and several lower bounds are proved, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not Possible.
Finite-time Analysis of the Multiarmed Bandit Problem
• Computer Science
Machine Learning
• 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
• Computer Science
Period. Math. Hung.
• 2010
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.
Asymptotically optimal multistage tests of simple hypotheses
A family of variable stage size multistage tests of simple hypotheses is described, based on efficient multistage sampling procedures. Using a loss function that is a linear combination of sampling
Regret Bounds and Minimax Policies under Partial Monitoring
• Computer Science
• 2010
The stochastic bandit game is considered, and it is proved that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
A Model for Selecting One of Two Medical Treatments
Abstract A simple cost function approach is proposed for designing an optimal clinical trial when a total of N patients with a disease are to be treated with one of two medical treatments. The cost
Kullback–Leibler upper confidence bounds for optimal sequential allocation
• Computer Science
• 2013
The main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively.
Sequential Experimentation in Clinical Trials: Design and Analysis
• Mathematics
• 2012
The results suggest that the design of Sequential Testing Theory and Stochastic Optimization over Time in Clinical Trials with Failure-Time Endpoints is a good guide for designing Sequential Methods for Vaccine Safety Evaluation and Surveillance in Public Health.
Randomized Allocation of Treatments in Sequential Experiments
SUMMARY Since the idea of sequential allocation was first studied, in a version of what is now called the multi-armed bandit problem, the results of many investigations have shown that, even when an