# Simple Bayesian Algorithms for Best Arm Identification

@inproceedings{Russo2016SimpleBA, title={Simple Bayesian Algorithms for Best Arm Identification}, author={Daniel Russo}, booktitle={COLT}, year={2016} }

This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly…

## 160 Citations

### Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

- Computer ScienceArXiv
- 2021

This work proposes to use the classical vector at a time (VT) rule, which samples each remaining arm once in each round, and proposes and analyzes a variant of the classical play the winner (PW) algorithm.

### TWO-ARMED GAUSSIAN BANDITS WITH UNKNOWN VARIANCES

- Mathematics, Computer Science
- 2022

This paper proposes a strategy comprising a sampling rule with randomized sampling following the estimated target allocation probabilities of arm draws and a recommendation rule using the augmented inverse probability weighting (AIPW) estimator, which is often used in the causal inference literature.

### BEST ARM IDENTIFICATION WITH A FIXED BUDGET

- Computer Science, Mathematics
- 2022

This paper derives a tight problem-dependent lower bound, which characterizes the optimal allocation ratio that depends on the gap of the expected rewards and the Fisher information of the bandit model and proposes the “RS-AIPW” strategy, which is optimal in the sense that the performance guarantee achieves the derived lower bound under a small gap.

### Adaptive Multiple-Arm Identification

- Computer ScienceICML
- 2017

A new hardness parameter for characterizing the difficulty of any given instance is introduced and a lower bound result is proved showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.

### Ordinal Optimization with Generalized Linear Model

- Mathematics, Computer Science2020 Winter Simulation Conference (WSC)
- 2020

An approximate solution for the optimal allocation is obtained, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense, and it is shown via numerical testing that it performs competitively even in the presence of model misspecification.

### Balancing Optimal Large Deviations in Sequential Selection

- Computer ScienceManagement Science
- 2022

A new methodology is proposed that can be guaranteed to adaptively learn the solution to these optimality conditions in a computationally efficient manner, without any tunable parameters, and under a wide variety of parametric sampling distributions.

### ORDINAL OPTIMIZATION WITH GENERALIZED LINEAR MODEL

- Mathematics, Computer Science
- 2020

An approximate solution is obtained for the optimal allocation of a finite sampling budget, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense and performs competitively even in the presence of model misspecification.

### Information theory for ranking and selection

- Computer ScienceNaval Research Logistics (NRL)
- 2020

It is proved that theIG policy is consistent, that is, as the sampling budget grows to infinity, the IG policy finds the true best alternative almost surely.

### Improving the Expected Improvement Algorithm

- Computer ScienceNIPS
- 2017

A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

### Optimal best arm selection for general distributions

- Computer ScienceArXiv
- 2019

This paper proposes a delta-correct algorithm that matches the lower bound as delta reduces to zero under a mild restriction that a known bound on the expectation of a non-negative, increasing convex function of underlying random variables, exists.

## References

SHOWING 1-10 OF 86 REFERENCES

### Sequential Sampling to Myopically Maximize the Expected Value of Information

- Computer ScienceINFORMS J. Comput.
- 2010

It is demonstrated empirically that the benefits of reducing the number of approximations in the previous algorithms are typically outweighed by the deleterious effects of a sequential one-step myopic allocation when more than a few dozen samples are allocated.

### Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

- Computer Science2014 48th Annual Conference on Information Sciences and Systems (CISS)
- 2014

It is shown that most best-arm algorithms can be described as variants of the two recent optimal algorithms that were proposed that achieve the optimal sample complexity for the problem.

### Sequential Sampling with Economics of Selection Procedures

- EconomicsManag. Sci.
- 2012

New economically motivated fully sequential sampling procedures to solve stochastic simulation problems, called economics of selection procedures, are presented, derived for comparing a known standard with one alternative whose unknown reward is inferred with sampling.

### Active Sequential Hypothesis Testing

- Computer ScienceArXiv
- 2012

Lower bounds for the optimal total cost are established using results in dynamic programming and the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability are characterized.

### New Two-Stage and Sequential Procedures for Selecting the Best Simulated System

- Computer ScienceOper. Res.
- 2001

New two-stage and sequential selection procedures that integrate attractive features of both lines of research are presented, derived assuming that the simulation output is normally distributed with unknown mean and variance that may differ for each system.

### Best Arm Identification in Multi-Armed Bandits

- Computer ScienceCOLT
- 2010

This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.

### Improving the Expected Improvement Algorithm

- Computer ScienceNIPS
- 2017

A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

### Ordinal optimization - empirical large deviations rate estimators, and stochastic multi-armed bandits

- Mathematics, Computer Science
- 2015

A negative result is shown that when populations have unbounded support, any policy that asymptotically identifies the correct population with probability at least $1 - \delta$ for each problem instance requires more than $O(\log(1/\delta)$ samples in making such a determination in any problem instance, suggesting that some restrictions are essential on populations to devise algorithms with correctness guarantees.

### Economic Analysis of Simulation Selection Problems

- BusinessManag. Sci.
- 2009

This paper frames the simulation selection problem as a “stoppable” version of a Bayesian bandit problem that treats the ability to simulate as a real option prior to project implementation and provides a framework for answering managerial questions.

### Analysis of Thompson Sampling for the Multi-armed Bandit Problem

- Computer ScienceCOLT
- 2012

For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.