• Corpus ID: 204852240

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

@article{Shang2020FixedConfidenceGF,
  title={Fixed-Confidence Guarantees for Bayesian Best-Arm Identification},
  author={Xuedong Shang and Rianne de Heide and Emilie Kaufmann and Pierre M'enard and Michal Valko},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.10945}
}
We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards… 
A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only
Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling
TLDR
The “policy choice” problem– otherwise known as best arm identification in the bandit literature– is considered and Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting.
Optimal Simple Regret in Bayesian Best Arm Identification
TLDR
A simple and easy-tocompute algorithm with its leading factor matches with the lower bound up to a constant factor is proposed and simulation results support the theoretical findings.
Gamification of Pure Exploration for Linear Bandits
TLDR
This work designs the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits, which naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly.
Adaptive Treatment Assignment in Experiments for Policy Choice
In a comment posted on Arxiv on Sep 16, 2021, Ariu et al. (2021) point out some problems regarding the statement of item 3 of Theorem 1 in Kasy and Sautmann (2021) (KS hereafter). Ariu et al. (2021)
Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification
TLDR
This work derives a tractable lower bound on the sample complexity of any δ-correct algorithm for the general Top-m identification problem, and describes the first algorithm for this setting, which is both practical and adapts to the amount of misspecification.
Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice"
TLDR
This paper connects the “policy choice” problem, proposed in Kasy and Sautmann (2021) as an instance of adaptive experimental design, to the frontiers of the bandit literature in machine learning to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community.
Stochastic Bandits with Vector Losses: Minimizing 𝓁∞-Norm of Relative Losses
TLDR
This paper models the situation as a problem of K-armed bandit with multiple losses, derives a problem-dependent sample complexity lower bound, and provides a regret lower bound of Ω(T ) and a matching algorithm.
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
TLDR
This work focuses on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Best-Arm Identification in Correlated Multi-Armed Bandits
TLDR
A novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm is proposed.
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Optimal Best Arm Identification with Fixed Confidence
TLDR
A new, tight lower bound on the sample complexity is proved on the complexity of best-arm identification in one-parameter bandit problems and the `Track-and-Stop' strategy is proposed, which is proved to be asymptotically optimal.
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
TLDR
A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity.
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean
Kullback–Leibler upper confidence bounds for optimal sequential allocation
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals
TLDR
New deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model are presented, allowing us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.
Simple Bayesian Algorithms for Best Arm Identification
TLDR
This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.
Anytime Exploration for Multi-armed Bandits using Confidence Information
TLDR
This work proposes AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m, a pure exploration problem for multi-armed bandits that requires making a prediction of the top-m arms at every time step.
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
TLDR
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.
Learning the distribution with largest mean: two bandit frameworks
TLDR
This paper reviews two different sequential learning tasks that have been considered in the bandit literature; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process.
Almost Optimal Exploration in Multi-Armed Bandits
TLDR
Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.
...
1
2
3
...