• Corpus ID: 11379717

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

  title={Second-Order Kernel Online Convex Optimization with Adaptive Sketching},
  author={Daniele Calandriello and Alessandro Lazaric and Michal Valko},
  booktitle={International Conference on Machine Learning},
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $O(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $O(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss… 

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

This paper proposes PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space and achieves the logarithmic regret.

Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity

Meta-Frank-Wolfe is proposed, the first online projection-free algorithm that uses stochastic gradient estimates and a novel "lifting" framework for the online discrete submodular maximization is developed, which outperform current state-of-the-art techniques on various experiments.

Efficient online learning with kernels for adversarial large scale problems

The resulting algorithm is based on approximations of the Gaussian kernel through Taylor expansion that achieves for d-dimensional inputs a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity.

Dynamic Online Learning via Frank-Wolfe Algorithm

This work proposes to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible, and establishes performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot.

Projection Free Dynamic Online Learning

A projection-free scheme based on Frank-Wolfe is proposed, where instead of online gradient steps, the algorithm’s required information is relaxed to only noisy gradient estimates, i.e., partial feedback and the dynamic regret bounds are derived.

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

Improved Kernel Alignment Regret Bound for Online Kernel Learning

An algorithm whose regret bound and computational complexity are better than previous re- sults is proposed, and a O ( 1 T p E [ A T ]) excess risk bound is obtained which improves the previous O (1 / √ T ) bound.

Efficient online learning with kernels for adversarial large scale problems

The algorithm is studied to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n α with α < 2 and improves the computational trade-off known for online kernel regression.

Projection Dual Averaging Based Second-order Online Learning

This paper develops a second-order projection dual averaging based online learning (SPDA) method to effectively handle high-throughput streaming data and demonstrates the efficacy of the proposed algorithms on large-scale online learning tasks, including online binary and multi-class classification and online anomaly detection.


The algorithm is studied is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n with α < 2, and improves the computational trade-off known for online kernel regression.



Logarithmic regret algorithms for online convex optimization

Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

Fast Randomized Kernel Methods With Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.

Online learning with kernels

This paper considers online learning in a reproducing kernel Hilbert space, and allows the exploitation of the kernel trick in an online setting, and examines the value of large margins for classification in the online setting with a drifting target.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

Large Scale Online Kernel Learning

A new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications, and presents two different online kernel machine learning algorithms that apply the random Fourier features for approximating kernel functions.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Adaptive Online Gradient Descent

An algorithm is provided, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √T and log T and shows strong optimality of the algorithm.

Online Row Sampling

This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.