Corpus ID: 49904719

An Optimal Algorithm for Stochastic and Adversarial Bandits

@article{Zimmert2019AnOA,
  title={An Optimal Algorithm for Stochastic and Adversarial Bandits},
  author={Julian Zimmert and Yevgeny Seldin},
  journal={ArXiv},
  year={2019},
  volume={abs/1807.07623}
}
We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the… Expand
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$. Expand
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
TLDR
An algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played, based on adaptation of the TsallisINF algorithm. Expand
Adaptivity, Variance and Separation for Adversarial Bandits
TLDR
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Expand
On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
TLDR
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Expand
Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits.
TLDR
A new algorithm is derived using regularization by Tsallis entropy to achieve best of both worlds guarantees in a variation of the multi-armed bandit problem and achieves the minimax optimal O( √ KT ) regret bound. Expand
Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds
  • Shinji Ito
  • Computer Science
  • COLT
  • 2021
This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial environments and that offer improved performance by exploiting inherent structures in such environments, asExpand
Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack
TLDR
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks, and shows that both algorithms achieve pseudo-regret (i.e., the optimal regret without attacks). Expand
Scale Free Adversarial Multi Armed Bandits
TLDR
A Follow The Regularized Leader (FTRL) algorithm is designed, which comes with the first scale-free regret guarantee for MAB, and a new technique for obtaining local-norm lower bounds for Bregman Divergences, which are crucial in bandit regret bounds. Expand
Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits
TLDR
Improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin are derived and the dependence on corruption magnitudeC is improved in the adversarial regime with a self-bounding constraint and the stochastic regime with adversarial corruptions. Expand
Stochastic Graphical Bandits with Adversarial Corruptions
TLDR
This paper proposes an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions and attains an O(α lnK lnT + αC) regret, where α is the independence number of the feedback graph, K is the number of arms, T is the time horizon, and C quantifies the total corruptions introduced by the adversary. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$. Expand
One Practical Algorithm for Both Stochastic and Adversarial Bandits
TLDR
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary. Expand
The Best of Both Worlds: Stochastic and Adversarial Bandits
TLDR
SAO (Stochastic and Adversarial Optimal) combines the O( √ n) worst-case regret of Exp3 and the (poly)logarithmic regret of UCB1 for stochastic rewards for adversarial rewards. Expand
More Adaptive Algorithms for Adversarial Bandits
TLDR
The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework. Expand
Minimax Policies for Adversarial and Stochastic Bandits.
TLDR
This work fills in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem and proposes a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. Expand
Best of both worlds: Stochastic & adversarial best-arm identification
TLDR
A lower bound is given that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards, and a simple parameter-free algorithm is designed and shown that its probability of error matches the lower bound in stoChastic problems, and it is also robust to adversary rewards. Expand
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
TLDR
A new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014) to reduce dependence of regret on a time horizon and eliminate an additive factor of order. Expand
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
TLDR
It is shown that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected regret against adaptive adversarial bandits. Expand
Fighting Bandits with a New Kind of Smoothness
TLDR
A novel family of algorithms with minimax optimal regret guarantees is defined using the notion of convex smoothing, and it is shown that a wide class of perturbation methods achieve a near-optimal regret as low as O(√NT log N), as long as the perturbations distribution has a bounded hazard function. Expand
Stochastic bandits robust to adversarial corruptions
We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can beExpand
...
1
2
3
4
...