• Corpus ID: 49904719

An Optimal Algorithm for Stochastic and Adversarial Bandits

@article{Zimmert2019AnOA,
  title={An Optimal Algorithm for Stochastic and Adversarial Bandits},
  author={Julian Zimmert and Yevgeny Seldin},
  journal={ArXiv},
  year={2019},
  volume={abs/1807.07623}
}
We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the… 

Tables from this paper

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$.

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

TLDR
An algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played, based on adaptation of the TsallisINF algorithm.

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

TLDR
An algorithm for combinatorial semi-bandits with a hybrid regret bound that includes a best-of-three-worlds guarantee and multiple data-dependent regret bounds is proposed, which implies that the algorithm will perform better as long as the environment is "easy" in terms of certain metrics.

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

TLDR
This work develops linear bandit algorithms that automatically adapt to different environments and additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to the authors' knowledge.

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

TLDR
It is shown that in adversarial regimes with a (∆, C, T ) self-bounding constraint the algorithm achieves regret bound, where T is the time horizon, K is the number of arms, ∆ i are the suboptimality gaps, i ∗ is the best arm, C is the corruption magnitude, and log + ( x ) = max (1, log x ) .

Adaptivity, Variance and Separation for Adversarial Bandits

TLDR
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators.

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

TLDR
A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators.

Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits.

TLDR
A new algorithm is derived using regularization by Tsallis entropy to achieve best of both worlds guarantees in a variation of the multi-armed bandit problem and achieves the minimax optimal O( √ KT ) regret bound.

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

TLDR
This paper proposes a new algorithm based on the principle of optimism in the face of uncertainty that achieves the near-optimal regret for both corrupted and uncorrupted cases simultaneously and shows that for both known C and unknown C cases, the algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

TLDR
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks, and provides a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks.
...

References

SHOWING 1-10 OF 38 REFERENCES

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$.

One Practical Algorithm for Both Stochastic and Adversarial Bandits

TLDR
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary.

The Best of Both Worlds: Stochastic and Adversarial Bandits

TLDR
SAO (Stochastic and Adversarial Optimal) combines the O( √ n) worst-case regret of Exp3 and the (poly)logarithmic regret of UCB1 for stochastic rewards for adversarial rewards.

More Adaptive Algorithms for Adversarial Bandits

TLDR
The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework.

Minimax Policies for Adversarial and Stochastic Bandits.

TLDR
This work fills in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem and proposes a new family of randomized algorithms based on an implicit normalization, as well as a new analysis.

Best of both worlds: Stochastic & adversarial best-arm identification

TLDR
A lower bound is given that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards, and a simple parameter-free algorithm is designed and shown that its probability of error matches the lower bound in stoChastic problems, and it is also robust to adversary rewards.

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

TLDR
It is shown that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected regret against adaptive adversarial bandits.

Fighting Bandits with a New Kind of Smoothness

TLDR
A novel family of algorithms with minimax optimal regret guarantees is defined using the notion of convex smoothing, and it is shown that a wide class of perturbation methods achieve a near-optimal regret as low as O(√NT log N), as long as the perturbations distribution has a bounded hazard function.

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

TLDR
It is proved that a geometric doubling trick can be used to conserve (minimax) bounds in $R_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent), and insights are given as to why exponential doubling tricks may be better, as they conserve bounds in R_T + O(\log T) and are close to conserving bounds in T.