• Publications
  • Influence
Gambling in a rigged casino: The adversarial multi-armed bandit problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given. Expand
On the generalization ability of on-line learning algorithms
TLDR
This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix. Expand
Scale-sensitive dimensions, uniform convergence, and learnability
TLDR
A characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire, and shows that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class. Expand
A Second-Order Perceptron Algorithm
TLDR
This paper describes an extension of the classical Perceptron algorithm, called second-order perceptron, and analyzes its performance within the mistake bound model of on-line learning. Expand
Bandits With Heavy Tail
TLDR
This paper examines the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, and derives matching lower bounds that show that the best achievable regret deteriorates when ε <; 1. Expand
Combinatorial Bandits
TLDR
A variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order ndln|S| where n is the time horizon is introduced. Expand
Tracking the best hyperplane with a simple budget Perceptron
TLDR
This paper introduces and analyzes a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget, and shows that the randomized algorithm strikes the optimal trade-off between budget B and norm U of the largest classifier in the comparison sequence. Expand
Analysis of two gradient-based algorithms for on-line regression
TLDR
A new analysis of two algorithms, Gradient Descent and Exponentiated Gradient, for solving regression problems in the on-line framework shows general regression bounds for any convex loss function and describes the connection between this approach and a general family of gradient-based algorithms. Expand
From Bandits to Experts: A Tale of Domination and Independence
TLDR
This work characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action) and in the undirected case it is shown that the learner can achieve optimal regret without even accessing the observable graph before selected an action. Expand
Improved Second-Order Bounds for Prediction with Expert Advice
TLDR
This work derives a simple and new forecasting strategy with regret at most order of Q*, the largest absolute value of any payoff, and devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. Expand
...
1
2
3
4
5
...