Corpus ID: 5900495

# The Best of Both Worlds: Stochastic and Adversarial Bandits

```@article{Bubeck2012TheBO,
title={The Best of Both Worlds: Stochastic and Adversarial Bandits},
author={S{\'e}bastien Bubeck and Aleksandrs Slivkins},
journal={ArXiv},
year={2012},
volume={abs/1202.4473}
}```
• Published 2012
• Mathematics, Computer Science
• ArXiv
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on multi-armed bandits (MAB). Prior work on MAB treats them… Expand
151 Citations

#### Figures and Topics from this paper

An Optimal Algorithm for Stochastic and Adversarial Bandits
• Computer Science, Mathematics
• AISTATS
• 2019
The proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). Expand
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
• Computer Science, Mathematics
• ICML
• 2019
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds \$T\$. Expand
Best of both worlds: Stochastic & adversarial best-arm identification
• Computer Science
• COLT
• 2018
A lower bound is given that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards, and a simple parameter-free algorithm is designed and shown that its probability of error matches the lower bound in stoChastic problems, and it is also robust to adversary rewards. Expand
• Computer Science, Mathematics
• COLT
• 2018
The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework. Expand
Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds
• Shinji Ito
• Computer Science
• COLT
• 2021
This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial environments and that offer improved performance by exploiting inherent structures in such environments, asExpand
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
• Computer Science, Mathematics
• ICML
• 2021
An algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played, based on adaptation of the TsallisINF algorithm. Expand
One Practical Algorithm for Both Stochastic and Adversarial Bandits
• Computer Science, Engineering
• ICML
• 2014
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary. Expand
Upper Confidence Bounds for Combining Stochastic Bandits
• Computer Science, Mathematics
• ArXiv
• 2020
This approach provides an easy and intuitive alternative strategy to the CORRAL algorithm for adversarial bandits, without requiring the stability conditions imposed by CORRAL on the base algorithms. Expand
• Computer Science, Mathematics
• ArXiv
• 2019
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Expand
Unifying the stochastic and the adversarial Bandits with Knapsack
• Computer Science, Mathematics
• IJCAI
• 2019
This paper proposes EXP3.BwK, a novel algorithm that achieves order optimal regret in the adversarial BwK setup, and incurs an almost optimal expected regret with an additional factor of \$\log(B)\$ in the stochastic B wK setup. Expand

#### References

SHOWING 1-10 OF 44 REFERENCES
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
• Mathematics, Computer Science
• Period. Math. Hung.
• 2010
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials. Expand
The multi-armed bandit problem with covariates
• Mathematics, Computer Science
• ArXiv
• 2011
This work introduces a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably “localized” static bandit problems and introduces a nonparametric model where the expected rewards are smooth functions of the covariate and the hardness of the problem is captured by a margin parameter. Expand
Regret Bounds and Minimax Policies under Partial Monitoring
• Mathematics
• 2010
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret,Expand
Better Algorithms for Benign Bandits
• Mathematics, Computer Science
• J. Mach. Learn. Res.
• 2011
A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting. Expand
Adaptive Bandits: Towards the best history-dependent strategy
• Mathematics, Computer Science
• AISTATS
• 2011
T tractable algorithms with regret bounded by T 2/3 C 1/3 log(|Theta|) 1/2 are provided here to provide tractablegorithms achieving a tight regret bound by ~O(sqrt(TAC)), where C is the number of classes of theta. Expand
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
• Mathematics, Computer Science
• COLT
• 2010
Deterministic Minimum Empirical Divergence policy is proposed and proved that DMED achieves the asymptotic bound and the index used in DMED for choosing an arm can be computed easily by a convex optimization technique. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
• Computer Science
• Machine Learning
• 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
Stochastic Linear Optimization under Bandit Feedback
• Mathematics, Computer Science
• COLT
• 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
• Computer Science, Mathematics
• COLT
• 2011
It is proved that for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Expand
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
• Computer Science, Mathematics
• Theor. Comput. Sci.
• 2009
A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered, providing the first analysis of the expected regret for such algorithms. Expand