This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.Expand

New upper and lower bounds are proved demonstrating that the optimal regret scales as $\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\ mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{ \mathbf {u}}$ isthe dimension of the input space, and $d_x$ isTheta, the dimensions of the system state.Expand

It is proved that (in the worst case) any algorithm requires at least $\epsilon^{-4}$ queries to find an stationary point, and establishes that stochastic gradient descent is minimax optimal in this model.Expand

This work presents a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods, and typically gives comparable or superior results.Expand

This paper introduces the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation, and designs an algorithm that achieves regret $\tilde{O}(T^{2/3]d^{1/3}_{m^\star})$ regret with no prior knowledge of the ideal dimension.Expand

This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability.Expand

This work designs a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm and shows that the improved dependence on predictor norm is near-optimal.Expand

A generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions is proposed and the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions are given.Expand

We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which… Expand

Modifications to recently introduced sequential complexity measures can be used to answer the question of whether there is some algorithm achieving this bound by providing sufficient conditions under which adaptive rates can be achieved, and a new type of adaptive bound for online linear optimization based on the spectral norm is derived.Expand