• Corpus ID: 49904719

An Optimal Algorithm for Stochastic and Adversarial Bandits

@article{Zimmert2018AnOA,
  title={An Optimal Algorithm for Stochastic and Adversarial Bandits},
  author={Julian Zimmert and Yevgeny Seldin},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.07623}
}
We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the… 

Tables from this paper

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$.

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

An algorithm for combinatorial semi-bandits with a hybrid regret bound that includes a best-of-three-worlds guarantee and multiple data-dependent regret bounds is proposed, which implies that the algorithm will perform better as long as the environment is "easy" in terms of certain metrics.

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

This work develops linear bandit algorithms that automatically adapt to different environments and additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to the authors' knowledge.

Adaptivity, Variance and Separation for Adversarial Bandits

A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators.

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

A first-order bound is proved for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators.

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

An algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played, based on adaptation of the Tsallis-INF algorithm.

Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits.

A new algorithm is derived using regularization by Tsallis entropy to achieve best of both worlds guarantees and achieves the minimax optimal O ( √ KT ) regret bound, slightly improving on the result of Avner et al.

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

A new algorithm is provided with a new hybrid regret bound that implies logarithmic regret in the stochastic regime and multiple data-dependent regret bounds in the adversarial regime, including bounds dependent on cumulative loss, total variation, and losssequence path-length.

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

This paper proposes a new algorithm based on the principle of optimism in the face of uncertainty that achieves the near-optimal regret for both corrupted and uncorrupted cases simultaneously and shows that for both known C and unknown C cases, the algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks, and provides a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks.
...

References

SHOWING 1-10 OF 34 REFERENCES

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$.

One Practical Algorithm for Both Stochastic and Adversarial Bandits

The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary.

The Best of Both Worlds: Stochastic and Adversarial Bandits

SAO (Stochastic and Adversarial Optimal) combines the O( √ n) worst-case regret of Exp3 and the (poly)logarithmic regret of UCB1 for stochastic rewards for adversarial rewards.

More Adaptive Algorithms for Adversarial Bandits

The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework.

Minimax Policies for Adversarial and Stochastic Bandits.

This work fills in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem and proposes a new family of randomized algorithms based on an implicit normalization, as well as a new analysis.

Best of both worlds: Stochastic & adversarial best-arm identification

A lower bound is given that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards, and a simple parameter-free algorithm is designed and shown that its probability of error matches the lower bound in stoChastic problems, and it is also robust to adversary rewards.

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

A new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014) to reduce dependence of regret on a time horizon and eliminate an additive factor of order.

Fighting Bandits with a New Kind of Smoothness

A novel family of algorithms with minimax optimal regret guarantees is defined using the notion of convex smoothing, and it is shown that a wide class of perturbation methods achieve a near-optimal regret as low as O(√NT log N), as long as the perturbations distribution has a bounded hazard function.

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

It is proved that a geometric doubling trick can be used to conserve (minimax) bounds in $R_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent), and insights are given as to why exponential doubling tricks may be better, as they conserve bounds in R_T + O(\log T) and are close to conserving bounds in T.