# Batched Bandit Problems

@inproceedings{Perchet2015BatchedBP, title={Batched Bandit Problems}, author={Vianney Perchet and Philippe Rigollet and Sylvain Chassang and Erik Snowberg}, booktitle={COLT}, year={2015} }

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

## 114 Citations

Batched Thompson Sampling for Multi-Armed Bandits

- Computer ScienceArXiv
- 2021

This work analyzes Thompson Sampling algorithms for stochastic multiarmed bandits in the batched setting and obtains almost tight regret-batches tradeoffs for the two-arm case.

Batched Neural Bandits

- Computer ScienceArXiv
- 2021

This work proposes the BatchNeuralUCB algorithm which combines neural networks with optimism to address the exploration-exploitation tradeoff while keeping the total number of batches limited and proves that it achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.

The Impact of Batch Learning in Stochastic Bandits

- Computer ScienceArXiv
- 2021

This work provides a policy-agnostic regret analysis and demonstrates upper and lower bounds for the regret of a candidate policy, and shows that the impact of batch learning can be measured in terms of online behavior.

Batched Multi-armed Bandits Problem

- Computer ScienceNeurIPS
- 2019

The BaSE (batched successive elimination) policy is proposed to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner.

The Impact of Batch Learning in Stochastic Linear Bandits

- Computer Science
- 2022

This work provides a policyagnostic regret analysis and demonstrates upper and lower bounds for the regret of a candidate policy and provides a more robust result for the 2-armed bandit problem as an important insight.

Invariant description of UCB strategy for multi-armed bandits for batch processing scenario

- Computer Science2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC)
- 2020

In this work, a set of Monte-Carlo simulations are performed for different horizon sizes, parameters of the strategy and batch sizes to determine the maximum regret for two-armed bandits.

A Sharp Memory-Regret Trade-Off for Multi-Pass Streaming Bandits

- Computer ScienceArXiv
- 2022

The main technical contribution is the lower bound which requires the use of information-theoretic techniques as well as ideas from round elimination to show that the residual problem remains challenging over subsequent passes.

Anytime optimal algorithms in stochastic multi-armed bandits

- Computer ScienceICML
- 2016

We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this…

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

- Computer ScienceNIPS
- 2017

The Upper-Confidence Frank-Wolfe algorithm is analyzed, inspired by techniques for bandits and convex optimization, and theoretical guarantees for the performance of this algorithm over various classes of functions are given.

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

- Computer ScienceICML
- 2021

An anytime algorithm is proposed that achieves the asymptotically optimal regret for exponential families of reward distributions with O(log log T · ilog(T ))1 batches, where α ∈ OT (1).

## References

SHOWING 1-10 OF 62 REFERENCES

The multi-armed bandit problem with covariates

- Computer ScienceArXiv
- 2011

This work introduces a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably “localized” static bandit problems and introduces a nonparametric model where the expected rewards are smooth functions of the covariate and the hardness of the problem is captured by a margin parameter.

Bounded regret in stochastic multi-armed bandits

- Computer Science, MathematicsCOLT
- 2013

A new randomized policy is proposed that attains a regret {\em uniformly bounded over time} in this setting and several lower bounds are proved, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not Possible.

Finite-time Analysis of the Multiarmed Bandit Problem

- Computer ScienceMachine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

- Computer SciencePeriod. Math. Hung.
- 2010

For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.

Asymptotically optimal multistage tests of simple hypotheses

- Mathematics
- 2007

A family of variable stage size multistage tests of simple hypotheses is described, based on efficient multistage sampling procedures. Using a loss function that is a linear combination of sampling…

Regret Bounds and Minimax Policies under Partial Monitoring

- Computer Science
- 2010

The stochastic bandit game is considered, and it is proved that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.

A Model for Selecting One of Two Medical Treatments

- Mathematics
- 1963

Abstract A simple cost function approach is proposed for designing an optimal clinical trial when a total of N patients with a disease are to be treated with one of two medical treatments. The cost…

Kullback–Leibler upper confidence bounds for optimal sequential allocation

- Computer Science
- 2013

The main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively.

Sequential Experimentation in Clinical Trials: Design and Analysis

- Mathematics
- 2012

The results suggest that the design of Sequential Testing Theory and Stochastic Optimization over Time in Clinical Trials with Failure-Time Endpoints is a good guide for designing Sequential Methods for Vaccine Safety Evaluation and Surveillance in Public Health.

Randomized Allocation of Treatments in Sequential Experiments

- Economics
- 1981

SUMMARY Since the idea of sequential allocation was first studied, in a version of what is now called the multi-armed bandit problem, the results of many investigations have shown that, even when an…