• Publications
  • Influence
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
Prediction, learning, and games
TLDR
This chapter discusses prediction with expert advice, efficient forecasters for large classes of experts, and randomized prediction for specific losses. Expand
The Nonstochastic Multiarmed Bandit Problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. Expand
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand
How to use expert advice
TLDR
This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts', and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. Expand
Adaptive and Self-Confident On-Line Learning Algorithms
TLDR
This paper shows that essentially the same optimized bounds can be obtained when the algorithms adaptively tune their learning rates as the examples in the sequence are progressively revealed, as they depend on the whole sequence of examples. Expand
Worst-Case Analysis of Selective Sampling for Linear Classification
TLDR
This paper introduces a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms, and shows that these semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Expand
Online Learning with Feedback Graphs: Beyond Bandits
TLDR
This work analyzes how the structure of the feedback graph controls the inherent difficulty of the induced $T$-round learning problem and shows how the regret is affected if the graphs are allowed to vary with time. Expand
Improved second-order bounds for prediction with expert advice
TLDR
New and sharper regret bounds are derived for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule, expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. Expand
A Gang of Bandits
TLDR
A global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, and derives two more scalable variants of this strategy based on different ways of clustering the graph nodes. Expand
...
1
2
3
4
5
...