# Simple Bayesian Algorithms for Best Arm Identification

@inproceedings{Russo2016SimpleBA,
title={Simple Bayesian Algorithms for Best Arm Identification},
author={Daniel Russo},
booktitle={COLT},
year={2016}
}
• Daniel Russo
• Published in COLT 26 February 2016
• Computer Science, Mathematics
This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly…
160 Citations

## Figures from this paper

### Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

• Computer Science
ArXiv
• 2021
This work proposes to use the classical vector at a time (VT) rule, which samples each remaining arm once in each round, and proposes and analyzes a variant of the classical play the winner (PW) algorithm.

### TWO-ARMED GAUSSIAN BANDITS WITH UNKNOWN VARIANCES

• Mathematics, Computer Science
• 2022
This paper proposes a strategy comprising a sampling rule with randomized sampling following the estimated target allocation probabilities of arm draws and a recommendation rule using the augmented inverse probability weighting (AIPW) estimator, which is often used in the causal inference literature.

### BEST ARM IDENTIFICATION WITH A FIXED BUDGET

• Computer Science, Mathematics
• 2022
This paper derives a tight problem-dependent lower bound, which characterizes the optimal allocation ratio that depends on the gap of the expected rewards and the Fisher information of the bandit model and proposes the “RS-AIPW” strategy, which is optimal in the sense that the performance guarantee achieves the derived lower bound under a small gap.

• Computer Science
ICML
• 2017
A new hardness parameter for characterizing the difficulty of any given instance is introduced and a lower bound result is proved showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.

### Ordinal Optimization with Generalized Linear Model

• Mathematics, Computer Science
2020 Winter Simulation Conference (WSC)
• 2020
An approximate solution for the optimal allocation is obtained, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense, and it is shown via numerical testing that it performs competitively even in the presence of model misspecification.

### Balancing Optimal Large Deviations in Sequential Selection

• Computer Science
Management Science
• 2022
A new methodology is proposed that can be guaranteed to adaptively learn the solution to these optimality conditions in a computationally efficient manner, without any tunable parameters, and under a wide variety of parametric sampling distributions.

### ORDINAL OPTIMIZATION WITH GENERALIZED LINEAR MODEL

• Mathematics, Computer Science
• 2020
An approximate solution is obtained for the optimal allocation of a finite sampling budget, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense and performs competitively even in the presence of model misspecification.

### Information theory for ranking and selection

• Computer Science
Naval Research Logistics (NRL)
• 2020
It is proved that theIG policy is consistent, that is, as the sampling budget grows to infinity, the IG policy finds the true best alternative almost surely.

### Improving the Expected Improvement Algorithm

• Computer Science
NIPS
• 2017
A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

### Optimal best arm selection for general distributions

• Computer Science
ArXiv
• 2019
This paper proposes a delta-correct algorithm that matches the lower bound as delta reduces to zero under a mild restriction that a known bound on the expectation of a non-negative, increasing convex function of underlying random variables, exists.

## References

SHOWING 1-10 OF 86 REFERENCES

### Sequential Sampling to Myopically Maximize the Expected Value of Information

• Computer Science
INFORMS J. Comput.
• 2010
It is demonstrated empirically that the benefits of reducing the number of approximations in the previous algorithms are typically outweighed by the deleterious effects of a sequential one-step myopic allocation when more than a few dozen samples are allocated.

### Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

• Computer Science
2014 48th Annual Conference on Information Sciences and Systems (CISS)
• 2014
It is shown that most best-arm algorithms can be described as variants of the two recent optimal algorithms that were proposed that achieve the optimal sample complexity for the problem.

### Sequential Sampling with Economics of Selection Procedures

• Economics
Manag. Sci.
• 2012
New economically motivated fully sequential sampling procedures to solve stochastic simulation problems, called economics of selection procedures, are presented, derived for comparing a known standard with one alternative whose unknown reward is inferred with sampling.

### Active Sequential Hypothesis Testing

• Computer Science
ArXiv
• 2012
Lower bounds for the optimal total cost are established using results in dynamic programming and the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability are characterized.

### New Two-Stage and Sequential Procedures for Selecting the Best Simulated System

• Computer Science
Oper. Res.
• 2001
New two-stage and sequential selection procedures that integrate attractive features of both lines of research are presented, derived assuming that the simulation output is normally distributed with unknown mean and variance that may differ for each system.

### Best Arm Identification in Multi-Armed Bandits

• Computer Science
COLT
• 2010
This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.

### Improving the Expected Improvement Algorithm

• Computer Science
NIPS
• 2017
A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

### Ordinal optimization - empirical large deviations rate estimators, and stochastic multi-armed bandits

• Mathematics, Computer Science
• 2015
A negative result is shown that when populations have unbounded support, any policy that asymptotically identifies the correct population with probability at least $1 - \delta$ for each problem instance requires more than $O(\log(1/\delta)$ samples in making such a determination in any problem instance, suggesting that some restrictions are essential on populations to devise algorithms with correctness guarantees.