• Publications
  • Influence
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand
Convex Optimization: Algorithms and Complexity
TLDR
This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms and provides a gentle introduction to structural optimization with FISTA, saddle-point mirror prox, Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods. Expand
Is Q-learning Provably Efficient?
Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typicallyExpand
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified. Expand
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
TLDR
It is demonstrated through extensive experimentation that this method consistently outperforms all existing provably $\ell-2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_ 2$-defenses. Expand
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
TLDR
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art. Expand
X-Armed Bandits
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to aExpand
Pure exploration in finitely-armed and continuous-armed bandits
TLDR
It is able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions. Expand
Bandits With Heavy Tail
TLDR
This paper examines the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, and derives matching lower bounds that show that the best achievable regret deteriorates when ε <; 1. Expand
Regret in Online Combinatorial Optimization
TLDR
This work addresses online linear optimization problems when the possible actions of the decision maker are represented by binary vectors and shows that the standard exponentially weighted average forecaster is a provably suboptimal strategy. Expand
...
1
2
3
4
5
...