Corpus ID: 5900495

The Best of Both Worlds: Stochastic and Adversarial Bandits

@article{Bubeck2012TheBO,
  title={The Best of Both Worlds: Stochastic and Adversarial Bandits},
  author={S{\'e}bastien Bubeck and Aleksandrs Slivkins},
  journal={ArXiv},
  year={2012},
  volume={abs/1202.4473}
}
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on multi-armed bandits (MAB). Prior work on MAB treats them… Expand
An Optimal Algorithm for Stochastic and Adversarial Bandits
TLDR
The proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). Expand
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$. Expand
Best of both worlds: Stochastic & adversarial best-arm identification
TLDR
A lower bound is given that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards, and a simple parameter-free algorithm is designed and shown that its probability of error matches the lower bound in stoChastic problems, and it is also robust to adversary rewards. Expand
More Adaptive Algorithms for Adversarial Bandits
TLDR
The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework. Expand
Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds
  • Shinji Ito
  • Computer Science
  • COLT
  • 2021
This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial environments and that offer improved performance by exploiting inherent structures in such environments, asExpand
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
TLDR
An algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played, based on adaptation of the TsallisINF algorithm. Expand
One Practical Algorithm for Both Stochastic and Adversarial Bandits
TLDR
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary. Expand
Upper Confidence Bounds for Combining Stochastic Bandits
TLDR
This approach provides an easy and intuitive alternative strategy to the CORRAL algorithm for adversarial bandits, without requiring the stability conditions imposed by CORRAL on the base algorithms. Expand
Adaptivity, Variance and Separation for Adversarial Bandits
TLDR
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Expand
Unifying the stochastic and the adversarial Bandits with Knapsack
TLDR
This paper proposes EXP3.BwK, a novel algorithm that achieves order optimal regret in the adversarial BwK setup, and incurs an almost optimal expected regret with an additional factor of $\log(B)$ in the stochastic B wK setup. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
TLDR
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials. Expand
The multi-armed bandit problem with covariates
TLDR
This work introduces a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably “localized” static bandit problems and introduces a nonparametric model where the expected rewards are smooth functions of the covariate and the hardness of the problem is captured by a margin parameter. Expand
Regret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret,Expand
Better Algorithms for Benign Bandits
TLDR
A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting. Expand
Adaptive Bandits: Towards the best history-dependent strategy
TLDR
T tractable algorithms with regret bounded by T 2/3 C 1/3 log(|Theta|) 1/2 are provided here to provide tractablegorithms achieving a tight regret bound by ~O(sqrt(TAC)), where C is the number of classes of theta. Expand
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
TLDR
Deterministic Minimum Empirical Divergence policy is proposed and proved that DMED achieves the asymptotic bound and the index used in DMED for choosing an arm can be computed easily by a convex optimization technique. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
TLDR
It is proved that for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Expand
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
TLDR
A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered, providing the first analysis of the expected regret for such algorithms. Expand
...
1
2
3
4
5
...