# The Price of Bandit Information for Online Optimization

@inproceedings{Dani2007ThePO, title={The Price of Bandit Information for Online Optimization}, author={Varsha Dani and Thomas P. Hayes and S. Kakade}, booktitle={NIPS}, year={2007} }

In the online linear optimization problem, a learner must choose, in each round, a decision from a set D ⊂ ℝn in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit… Expand

#### Tables and Topics from this paper

#### 189 Citations

Better Algorithms for Benign Bandits

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2011

A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting. Expand

Stochastic Linear Optimization under Bandit Feedback

- Mathematics, Computer Science
- COLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand

Regret in Online Combinatorial Optimization

- Computer Science, Mathematics
- Math. Oper. Res.
- 2014

This work addresses online linear optimization problems when the possible actions of the decision maker are represented by binary vectors and shows that the standard exponentially weighted average forecaster is a provably suboptimal strategy. Expand

Asymptotically Optimal Bandits under Weighted Information

- Computer Science, Engineering
- ArXiv
- 2021

A Thompson-Sampling-based strategy is proposed, called Weighted Thompson Sampling (WTS), that designs the power profile as its posterior belief of each arm being the best arm, and shows that its upper bound matches the derived logarithmic lower bound. Expand

On the Complexity of Bandit Linear Optimization

- Computer Science, Mathematics
- COLT
- 2015

It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret. Expand

Online Stochastic Linear Optimization under One-bit Feedback

- Computer Science, Mathematics
- ICML
- 2016

This paper develops an efficient online learning algorithm by exploiting particular structures of the observation model to minimize the regret defined by the unknown linear function in a special bandit setting of online stochastic linear optimization. Expand

Improved Regret Bounds for Projection-free Bandit Convex Optimization

- Computer Science, Mathematics
- AISTATS
- 2020

The challenge of designing online algorithms for the bandit convex optimization problem (BCO) is revisited and the first such algorithm that attains expected regret is presented, using only overall calls to the linear optimization oracle, in expectation, where T is the number of prediction rounds. Expand

Towards Minimax Policies for Online Linear Optimization with Bandit Feedback

- Mathematics, Computer Science
- COLT
- 2012

This work provides an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$ actions, and shaves off an extraneous $d$ factor compared to previous works, and gives a regret bound of order $d \sqrt(n) $ for any compact set of actions. Expand

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

- Computer Science, Mathematics
- COLT
- 2013

The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided. Expand

An Efficient Algorithm for Learning with Semi-bandit Feedback

- Computer Science
- ALT
- 2013

A learning algorithm is proposed based on combining the Follow-the-Perturbed-Leader prediction method with a novel loss estimation procedure called Geometric Resampling that can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Expand

#### References

SHOWING 1-10 OF 19 REFERENCES

Stochastic Linear Optimization under Bandit Feedback

- Mathematics, Computer Science
- COLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand

Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

- Mathematics, Computer Science
- SODA '06
- 2006

It is proved that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary. Expand

High-Probability Regret Bounds for Bandit Online Linear Optimization

- Mathematics, Computer Science
- COLT
- 2008

This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. Expand

Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary

- Computer Science
- COLT
- 2004

This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})). Expand

Efficient algorithms for online decision problems

- Computer Science
- J. Comput. Syst. Sci.
- 2005

This work gives a simple approach for doing nearly as well as the best single decision, where the best is chosen with the benefit of hindsight, and these follow-the-leader style algorithms extend naturally to a large class of structured online problems for which the exponential algorithms are inefficient. Expand

Gambling in a rigged casino: The adversarial multi-armed bandit problem

- Computer Science, Mathematics
- Proceedings of IEEE 36th Annual Foundations of Computer Science
- 1995

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given. Expand

Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

- Computer Science
- STOC '04
- 2004

A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach. Expand

The On-Line Shortest Path Problem Under Partial Monitoring

- Mathematics
- 2007

The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a… Expand

A decision-theoretic generalization of on-line learning and an application to boosting

- Computer Science
- EuroCOLT
- 1995

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand

SOME ASPECTS OF THE SEQUENTIAL DESIGN OF EXPERIMENTS

- 2007

1. Introduction. Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determined… Expand