• Corpus ID: 11379717

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

  title={Second-Order Kernel Online Convex Optimization with Adaptive Sketching},
  author={Daniele Calandriello and Alessandro Lazaric and Michal Valko},
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $O(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $O(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss… 

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

This paper proposes PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space and achieves the logarithmic regret.

Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity

Meta-Frank-Wolfe is proposed, the first online projection-free algorithm that uses stochastic gradient estimates and a novel "lifting" framework for the online discrete submodular maximization is developed, which outperform current state-of-the-art techniques on various experiments.

Dynamic Regret for Strongly Adaptive Methods and Optimality of Online KRR

It is demonstrated that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret in terms of path variation VT of the comparator sequence without apriori knowledge of VT.

Efficient online learning with kernels for adversarial large scale problems

The resulting algorithm is based on approximations of the Gaussian kernel through Taylor expansion that achieves for d-dimensional inputs a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity.

Dynamic Online Learning via Frank-Wolfe Algorithm

This work proposes to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible, and establishes performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot.

Projection Free Dynamic Online Learning

A projection-free scheme based on Frank-Wolfe is proposed, where instead of online gradient steps, the algorithm’s required information is relaxed to only noisy gradient estimates, i.e., partial feedback and the dynamic regret bounds are derived.

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

Efficient online learning with kernels for adversarial large scale problems

The algorithm is studied to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n α with α < 2 and improves the computational trade-off known for online kernel regression.


The algorithm is studied is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n with α < 2, and improves the computational trade-off known for online kernel regression.

Sparse Representations of Positive Functions via First- and Second-Order Pseudo-Mirror Descent

First and second-order variants of stochastic mirror descent employing pseudo-gradients and complexity-reducing projections are developed, which establish tradeoffs between the radius of convergence of the expected sub-optimality and the projection budget parameter, as well as non-asymptotic bounds on the model complexity.



Logarithmic regret algorithms for online convex optimization

Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

Fast Randomized Kernel Methods With Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.

Dual Space Gradient Descent for Online Learning

The Dual Space Gradient Descent (DualSGD) is presented, a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance while simultaneously mitigating the impact of the dimensionality issue on learning performance.

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.

Online learning with kernels

This paper considers online learning in a reproducing kernel Hilbert space, and allows the exploitation of the kernel trick in an online setting, and examines the value of large margins for classification in the online setting with a drifting target.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

Large Scale Online Kernel Learning

A new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications, and presents two different online kernel machine learning algorithms that apply the random Fourier features for approximating kernel functions.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Adaptive Online Gradient Descent

An algorithm is provided, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √T and log T and shows strong optimality of the algorithm.