# High-Probability Regret Bounds for Bandit Online Linear Optimization

@inproceedings{Bartlett2008HighProbabilityRB, title={High-Probability Regret Bounds for Bandit Online Linear Optimization}, author={Peter L. Bartlett and Varsha Dani and Thomas P. Hayes and Sham M. Kakade and Alexander Rakhlin and Ambuj Tewari}, booktitle={COLT}, year={2008} }

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable…

## 101 Citations

The Price of Bandit Information for Online Optimization

- Computer ScienceNIPS
- 2007

This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

- Computer ScienceNIPS
- 2015

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability, using a simple and intuitive loss-estimation strategy called Implicit exploration (IX) that allows a remarkably clean analysis.

An Optimal Algorithm for Linear Bandits

- Computer ScienceArXiv
- 2011

Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert Advice (where EXP4 is optimal).

20 D ec 2 01 1 An Optimal Algorithm for Linear Bandits

- Computer Science

Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert Advice (where EXP4 is optimal).

Explore no more : Simple and tight high-probability bounds for non-stochastic bandits

- Computer Science
- 2015

This paper shows that it is possible to prove high-probability regret bounds without this undesirable exploration component, and relies on a simple and intuitive loss-estimation strategy called Implicit eXploration (IX) that allows a very clean analysis that leads to the best known constant factors in the bounds.

Efficient Bandit Convex Optimization: Beyond Linear Losses

- Computer ScienceCOLT
- 2021

The algorithm is a bandit version of the classical regularized Newton’s method, which involves estimation of gradients and Hessians of the loss functions from single point feedback and is the first algorithm for bandit convex optimization with quadratic losses which is efficiently implementable and achieves optimal regret guarantees.

Beating the adaptive bandit with high probability

- Computer Science, Mathematics2009 Information Theory and Applications Workshop
- 2009

We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the…

Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits

- Computer Science, MathematicsCOLT
- 2022

It is shown that follow the regularized leader with the entropic barrier and suitable loss estimators has regret against an adaptive adversary of at most O ( d 2 √ T log( T )) and can be implement in polynomial time, which improves on the best known bound for an efficient algorithm.

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

- Computer ScienceNeurIPS
- 2020

The approach uses standard unbiased estimators and relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality to obtain high probability regret bounds for online learning with bandit feedback against an adaptive adversary.

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback

- Computer ScienceCOLT
- 2010

The multi-point bandit setting, in which the player can query each loss function at multiple points, is introduced, and regret bounds that closely resemble bounds for the full information case are proved.

## References

SHOWING 1-10 OF 25 REFERENCES

The Price of Bandit Information for Online Optimization

- Computer ScienceNIPS
- 2007

This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.

Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

- Computer ScienceSODA '06
- 2006

It is proved that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary.

Stochastic Linear Optimization under Bandit Feedback

- Computer Science, MathematicsCOLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

- Computer ScienceCOLT
- 2008

This work introduces an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret and presents a novel connection between online learning and interior point methods.

The Nonstochastic Multiarmed Bandit Problem

- Computer Science, EconomicsSIAM J. Comput.
- 2002

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.

Online decision problems with large strategy sets

- Computer Science
- 2005

The theory of generalized multi-armed bandit problems is extended by supplying non-trivial algorithms and lower bounds for cases in which the strategy set is much larger (exponential or infinite) and the cost function class is structured, e.g. by constraining the cost functions to be linear or convex.

Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

- Computer ScienceSTOC '04
- 2004

A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach.

The On-Line Shortest Path Problem Under Partial Monitoring

- Computer ScienceJ. Mach. Learn. Res.
- 2007

The on-line shortest path problem is considered under various models of partial monitoring, and a version of the multi-armed bandit setting for shortest path is discussed, where the decision maker learns only the total weight of the chosen path but not the weights of the individual edges on the path.

Improved Risk Tail Bounds for On-Line Algorithms

- Computer ScienceIEEE Transactions on Information Theory
- 2008

Tight bounds are derived on the risk of models in the ensemble generated by incremental training of an arbitrary learning algorithm based on uniform convergence arguments, and improves on previous bounds published by the same authors.

Provably competitive adaptive routing

- Computer ScienceProceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies.
- 2005

The routing protocol presented in this work attempts to provide throughput-competitive route selection against an adaptive adversary and a proof of the convergence time of the algorithm is presented as well as preliminary simulation results.