Simple Bayesian Algorithms for Best Arm Identification

@inproceedings{Russo2016SimpleBA,
  title={Simple Bayesian Algorithms for Best Arm Identification},
  author={Daniel Russo},
  booktitle={COLT},
  year={2016}
}
  • Daniel Russo
  • Published in COLT 26 February 2016
  • Computer Science, Mathematics
This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly… 

Figures from this paper

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

TLDR
This work proposes to use the classical vector at a time (VT) rule, which samples each remaining arm once in each round, and proposes and analyzes a variant of the classical play the winner (PW) algorithm.

TWO-ARMED GAUSSIAN BANDITS WITH UNKNOWN VARIANCES

TLDR
This paper proposes a strategy comprising a sampling rule with randomized sampling following the estimated target allocation probabilities of arm draws and a recommendation rule using the augmented inverse probability weighting (AIPW) estimator, which is often used in the causal inference literature.

BEST ARM IDENTIFICATION WITH A FIXED BUDGET

TLDR
This paper derives a tight problem-dependent lower bound, which characterizes the optimal allocation ratio that depends on the gap of the expected rewards and the Fisher information of the bandit model and proposes the “RS-AIPW” strategy, which is optimal in the sense that the performance guarantee achieves the derived lower bound under a small gap.

Adaptive Multiple-Arm Identification

TLDR
A new hardness parameter for characterizing the difficulty of any given instance is introduced and a lower bound result is proved showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.

Ordinal Optimization with Generalized Linear Model

TLDR
An approximate solution for the optimal allocation is obtained, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense, and it is shown via numerical testing that it performs competitively even in the presence of model misspecification.

Balancing Optimal Large Deviations in Sequential Selection

TLDR
A new methodology is proposed that can be guaranteed to adaptively learn the solution to these optimality conditions in a computationally efficient manner, without any tunable parameters, and under a wide variety of parametric sampling distributions.

ORDINAL OPTIMIZATION WITH GENERALIZED LINEAR MODEL

TLDR
An approximate solution is obtained for the optimal allocation of a finite sampling budget, which is leveraged to design a sampling strategy that is near-optimal in a suitable asymptotic sense and performs competitively even in the presence of model misspecification.

Information theory for ranking and selection

TLDR
It is proved that theIG policy is consistent, that is, as the sampling budget grows to infinity, the IG policy finds the true best alternative almost surely.

Improving the Expected Improvement Algorithm

TLDR
A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

Optimal best arm selection for general distributions

TLDR
This paper proposes a delta-correct algorithm that matches the lower bound as delta reduces to zero under a mild restriction that a known bound on the expectation of a non-negative, increasing convex function of underlying random variables, exists.
...

References

SHOWING 1-10 OF 86 REFERENCES

Sequential Sampling to Myopically Maximize the Expected Value of Information

TLDR
It is demonstrated empirically that the benefits of reducing the number of approximations in the previous algorithms are typically outweighed by the deleterious effects of a sequential one-step myopic allocation when more than a few dozen samples are allocated.

Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

TLDR
It is shown that most best-arm algorithms can be described as variants of the two recent optimal algorithms that were proposed that achieve the optimal sample complexity for the problem.

Sequential Sampling with Economics of Selection Procedures

TLDR
New economically motivated fully sequential sampling procedures to solve stochastic simulation problems, called economics of selection procedures, are presented, derived for comparing a known standard with one alternative whose unknown reward is inferred with sampling.

Active Sequential Hypothesis Testing

TLDR
Lower bounds for the optimal total cost are established using results in dynamic programming and the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability are characterized.

New Two-Stage and Sequential Procedures for Selecting the Best Simulated System

TLDR
New two-stage and sequential selection procedures that integrate attractive features of both lines of research are presented, derived assuming that the simulation output is normally distributed with unknown mean and variance that may differ for each system.

Best Arm Identification in Multi-Armed Bandits

TLDR
This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.

Improving the Expected Improvement Algorithm

TLDR
A simple modification of the expected improvement algorithm is introduced, which results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

Ordinal optimization - empirical large deviations rate estimators, and stochastic multi-armed bandits

TLDR
A negative result is shown that when populations have unbounded support, any policy that asymptotically identifies the correct population with probability at least $1 - \delta$ for each problem instance requires more than $O(\log(1/\delta)$ samples in making such a determination in any problem instance, suggesting that some restrictions are essential on populations to devise algorithms with correctness guarantees.

Economic Analysis of Simulation Selection Problems

TLDR
This paper frames the simulation selection problem as a “stoppable” version of a Bayesian bandit problem that treats the ability to simulate as a real option prior to project implementation and provides a framework for answering managerial questions.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

TLDR
For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.
...