Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

@inproceedings{Dani2006RobbingTB,
  title={Robbing the bandit: less regret in online geometric optimization against an adaptive adversary},
  author={Varsha Dani and Thomas P. Hayes},
  booktitle={SODA '06},
  year={2006}
}
We consider "online bandit geometric optimization," a problem of iterated decision making in a largely unknown and constantly changing environment. The goal is to minimize "regret," defined as the difference between the actual loss of an online decision-making procedure and that of the best single decision in hindsight. "Geometric optimization" refers to a generalization of the well-known multi-armed bandit problem, in which the decision space is some bounded subset of Rd, the adversary is… Expand
Better Algorithms for Benign Bandits
TLDR
A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting. Expand
The Price of Bandit Information for Online Optimization
TLDR
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order. Expand
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
TLDR
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})). Expand
Following the Perturbed Leader to Gamble at Multi-armed Bandits
TLDR
This work shows that the very straightforward and easy to implement algorithm Adaptive Bandit fpl can attain a regret of $O(\sqrt{T \ln T})$ against an adaptive adversary and holds with respect to the best lever in hindsight and matches the previous best regret bounds. Expand
Approximation Algorithms Going Online
In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) setS of feasible decisions. Nature (who may be adversarial) choo ses aExpand
Playing games with approximation algorithms
TLDR
This work shows how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime, and combines Zinkevich's algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Expand
High-Probability Regret Bounds for Bandit Online Linear Optimization
TLDR
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. Expand
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback
TLDR
The multi-point bandit setting, in which the player can query each loss function at multiple points, is introduced, and regret bounds that closely resemble bounds for the full information case are proved. Expand
How to Beat the Adaptive Multi-Armed Bandit
TLDR
A new algorithm is presented for the multi-armed bandit problem, and nearly optimal guarantees for the regret against both non-adaptive and adaptive adversaries are proved, and dependence on $T$ is best possible, and matches that of the full-information version of the problem. Expand
4 Learning , Regret minimization , and Equilibria
Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with anExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
TLDR
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})). Expand
Efficient Algorithms for Online Decision Problems
TLDR
It is shown that a very simple idea, used in Hannan's seminal 1957 paper, gives efficient solutions to all of these problems, including a (1+∈)-competitive algorithm as well as a lazy one that rarely switches between decisions. Expand
Gambling in a rigged casino: The adversarial multi-armed bandit problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given. Expand
The non-stochastic multi-armed bandit problem
In the multi-armed bandit problem, a gambler must decide which arm of non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has receivedExpand
The Nonstochastic Multiarmed Bandit Problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. Expand
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
TLDR
A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach. Expand
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
TLDR
An algorithm for convex programming is introduced, and it is shown that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized inf initesimalgradient ascent (GIGA) is universally consistent. Expand
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
A decision-theoretic generalization of on-line learning and an application to boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
The Weighted Majority Algorithm
TLDR
The Weighted Majority algorithm is an efficient and robust method for selecting good predictive performance from a pool algorithms; a powerful tool getting upper bounds on learning problems while ignoring computational efficiency. Expand
...
1
2
3
...