• Publications
  • Influence
Spectrally-normalized margin bounds for neural networks
TLDR
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity. Expand
Naive Exploration is Optimal for Online LQR
TLDR
New upper and lower bounds are proved demonstrating that the optimal regret scales as $\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\ mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{ \mathbf {u}}$ isthe dimension of the input space, and $d_x$ isTheta, the dimensions of the system state. Expand
Lower Bounds for Non-Convex Stochastic Optimization
TLDR
It is proved that (in the worst case) any algorithm requires at least $\epsilon^{-4}$ queries to find an stationary point, and establishes that stochastic gradient descent is minimax optimal in this model. Expand
Practical Contextual Bandits with Regression Oracles
TLDR
This work presents a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods, and typically gives comparable or superior results. Expand
Model selection for contextual bandits
TLDR
This paper introduces the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation, and designs an algorithm that achieves regret $\tilde{O}(T^{2/3]d^{1/3}_{m^\star})$ regret with no prior knowledge of the ideal dimension. Expand
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
TLDR
This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability. Expand
Logistic Regression: The Importance of Being Improper
TLDR
This work designs a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm and shows that the improved dependence on predictor norm is near-optimal. Expand
Parameter-Free Online Learning via Model Selection
TLDR
A generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions is proposed and the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions are given. Expand
Learning in Games: Robustness of Fast Convergence
We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games. Our property, whichExpand
Adaptive Online Learning
TLDR
Modifications to recently introduced sequential complexity measures can be used to answer the question of whether there is some algorithm achieving this bound by providing sufficient conditions under which adaptive rates can be achieved, and a new type of adaptive bound for online linear optimization based on the spectral norm is derived. Expand
...
1
2
3
4
5
...