Corpus ID: 1028408

The Price of Bandit Information for Online Optimization

@inproceedings{Dani2007ThePO,
  title={The Price of Bandit Information for Online Optimization},
  author={Varsha Dani and Thomas P. Hayes and S. Kakade},
  booktitle={NIPS},
  year={2007}
}
In the online linear optimization problem, a learner must choose, in each round, a decision from a set D ⊂ ℝn in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit… Expand
Better Algorithms for Benign Bandits
TLDR
A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting. Expand
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand
Regret in Online Combinatorial Optimization
TLDR
This work addresses online linear optimization problems when the possible actions of the decision maker are represented by binary vectors and shows that the standard exponentially weighted average forecaster is a provably suboptimal strategy. Expand
Asymptotically Optimal Bandits under Weighted Information
TLDR
A Thompson-Sampling-based strategy is proposed, called Weighted Thompson Sampling (WTS), that designs the power profile as its posterior belief of each arm being the best arm, and shows that its upper bound matches the derived logarithmic lower bound. Expand
On the Complexity of Bandit Linear Optimization
  • O. Shamir
  • Computer Science, Mathematics
  • COLT
  • 2015
TLDR
It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret. Expand
Online Stochastic Linear Optimization under One-bit Feedback
TLDR
This paper develops an efficient online learning algorithm by exploiting particular structures of the observation model to minimize the regret defined by the unknown linear function in a special bandit setting of online stochastic linear optimization. Expand
Improved Regret Bounds for Projection-free Bandit Convex Optimization
TLDR
The challenge of designing online algorithms for the bandit convex optimization problem (BCO) is revisited and the first such algorithm that attains expected regret is presented, using only overall calls to the linear optimization oracle, in expectation, where T is the number of prediction rounds. Expand
Towards Minimax Policies for Online Linear Optimization with Bandit Feedback
TLDR
This work provides an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$ actions, and shaves off an extraneous $d$ factor compared to previous works, and gives a regret bound of order $d \sqrt(n) $ for any compact set of actions. Expand
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
  • O. Shamir
  • Computer Science, Mathematics
  • COLT
  • 2013
TLDR
The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided. Expand
An Efficient Algorithm for Learning with Semi-bandit Feedback
TLDR
A learning algorithm is proposed based on combining the Follow-the-Perturbed-Leader prediction method with a novel loss estimation procedure called Geometric Resampling that can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 19 REFERENCES
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
TLDR
It is proved that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary. Expand
High-Probability Regret Bounds for Bandit Online Linear Optimization
TLDR
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. Expand
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
TLDR
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})). Expand
Efficient algorithms for online decision problems
TLDR
This work gives a simple approach for doing nearly as well as the best single decision, where the best is chosen with the benefit of hindsight, and these follow-the-leader style algorithms extend naturally to a large class of structured online problems for which the exponential algorithms are inefficient. Expand
Gambling in a rigged casino: The adversarial multi-armed bandit problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given. Expand
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
TLDR
A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach. Expand
The On-Line Shortest Path Problem Under Partial Monitoring
The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, aExpand
A decision-theoretic generalization of on-line learning and an application to boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
SOME ASPECTS OF THE SEQUENTIAL DESIGN OF EXPERIMENTS
1. Introduction. Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determinedExpand
...
1
2
...