# Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

@article{Shang2020FixedConfidenceGF, title={Fixed-Confidence Guarantees for Bayesian Best-Arm Identification}, author={Xuedong Shang and Rianne de Heide and Emilie Kaufmann and Pierre M'enard and Michal Valko}, journal={ArXiv}, year={2020}, volume={abs/1910.10945} }

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards…

## 14 Citations

A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

- Mathematics
- 2021

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only…

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

- Economics, Computer Science
- 2021

The “policy choice” problem– otherwise known as best arm identification in the bandit literature– is considered and Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting.

Optimal Simple Regret in Bayesian Best Arm Identification

- Computer Science, Mathematics
- 2021

A simple and easy-tocompute algorithm with its leading factor matches with the lower bound up to a constant factor is proposed and simulation results support the theoretical findings.

Gamification of Pure Exploration for Linear Bandits

- Computer Science, MathematicsICML
- 2020

This work designs the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits, which naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly.

Adaptive Treatment Assignment in Experiments for Policy Choice

- SSRN Electronic Journal
- 2019

In a comment posted on Arxiv on Sep 16, 2021, Ariu et al. (2021) point out some problems regarding the statement of item 3 of Theorem 1 in Kasy and Sautmann (2021) (KS hereafter). Ariu et al. (2021)…

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

- Computer Science, MathematicsArXiv
- 2021

This work derives a tractable lower bound on the sample complexity of any δ-correct algorithm for the general Top-m identification problem, and describes the first algorithm for this setting, which is both practical and adapts to the amount of misspecification.

Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice"

- Computer ScienceArXiv
- 2021

This paper connects the “policy choice” problem, proposed in Kasy and Sautmann (2021) as an instance of adaptive experimental design, to the frontiers of the bandit literature in machine learning to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community.

Stochastic Bandits with Vector Losses: Minimizing 𝓁∞-Norm of Relative Losses

- Computer Science, MathematicsArXiv
- 2020

This paper models the situation as a problem of K-armed bandit with multiple losses, derives a problem-dependent sample complexity lower bound, and provides a regret lower bound of Ω(T ) and a matching algorithm.

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

- Computer Science, MathematicsALT
- 2021

This work focuses on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.

Best-Arm Identification in Correlated Multi-Armed Bandits

- Computer Science, MathematicsIEEE Journal on Selected Areas in Information Theory
- 2021

A novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm is proposed.

## References

SHOWING 1-10 OF 30 REFERENCES

Optimal Best Arm Identification with Fixed Confidence

- Computer Science, MathematicsCOLT
- 2016

A new, tight lower bound on the sample complexity is proved on the complexity of best-arm identification in one-parameter bandit problems and the `Track-and-Stop' strategy is proposed, which is proved to be asymptotically optimal.

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

- Computer Science, MathematicsNIPS
- 2012

A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity.

Best Arm Identification in Multi-Armed Bandits

- MathematicsCOLT 2010
- 2010

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean…

Kullback–Leibler upper confidence bounds for optimal sequential allocation

- Mathematics
- 2013

We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper…

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

- Computer Science, MathematicsArXiv
- 2018

New deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model are presented, allowing us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.

Simple Bayesian Algorithms for Best Arm Identification

- Computer Science, MathematicsCOLT
- 2016

This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.

Anytime Exploration for Multi-armed Bandits using Confidence Information

- Mathematics, MedicineICML
- 2016

This work proposes AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m, a pure exploration problem for multi-armed bandits that requires making a prediction of the top-m arms at every time step.

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

- Mathematics, Computer ScienceCOLT
- 2014

It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.

Learning the distribution with largest mean: two bandit frameworks

- Computer Science, MathematicsArXiv
- 2017

This paper reviews two different sequential learning tasks that have been considered in the bandit literature; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process.

Almost Optimal Exploration in Multi-Armed Bandits

- Mathematics, Computer ScienceICML
- 2013

Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.