• Corpus ID: 11379717

# Second-Order Kernel Online Convex Optimization with Adaptive Sketching

@inproceedings{Calandriello2017SecondOrderKO,
title={Second-Order Kernel Online Convex Optimization with Adaptive Sketching},
author={Daniele Calandriello and Alessandro Lazaric and Michal Valko},
booktitle={International Conference on Machine Learning},
year={2017}
}
• Published in
International Conference on…
15 June 2017
• Computer Science
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $O(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $O(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss…
• Computer Science
NIPS
• 2017
This paper proposes PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space and achieves the logarithmic regret.
• Computer Science
ICML
• 2018
Meta-Frank-Wolfe is proposed, the first online projection-free algorithm that uses stochastic gradient estimates and a novel "lifting" framework for the online discrete submodular maximization is developed, which outperform current state-of-the-art techniques on various experiments.
• Computer Science
NeurIPS
• 2019
The resulting algorithm is based on approximations of the Gaussian kernel through Taylor expansion that achieves for d-dimensional inputs a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity.
• Computer Science
IEEE Transactions on Signal Processing
• 2021
This work proposes to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible, and establishes performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot.
• Computer Science
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2020
A projection-free scheme based on Frank-Wolfe is proposed, where instead of online gradient steps, the algorithm’s required information is relaxed to only noisy gradient estimates, i.e., partial feedback and the dynamic regret bounds are derived.
• Computer Science
COLT
• 2019
BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.
• Computer Science
ArXiv
• 2022
An algorithm whose regret bound and computational complexity are better than previous re- sults is proposed, and a O ( 1 T p E [ A T ]) excess risk bound is obtained which improves the previous O (1 / √ T ) bound.
• Computer Science
• 2022
The algorithm is studied to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n α with α < 2 and improves the computational trade-off known for online kernel regression.
• Computer Science
2022 IEEE International Conference on Data Mining (ICDM)
• 2022
This paper develops a second-order projection dual averaging based online learning (SPDA) method to effectively handle high-throughput streaming data and demonstrates the efficacy of the proposed algorithms on large-scale online learning tasks, including online binary and multi-class classification and online anomaly detection.
• Computer Science
• 2019
The algorithm is studied is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n with α < 2, and improves the computational trade-off known for online kernel regression.

## References

SHOWING 1-10 OF 31 REFERENCES

• Computer Science
Machine Learning
• 2007
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
• Computer Science
ArXiv
• 2014
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.
• Computer Science
J. Mach. Learn. Res.
• 2012
Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.
• Computer Science
IEEE Transactions on Signal Processing
• 2004
This paper considers online learning in a reproducing kernel Hilbert space, and allows the exploitation of the kernel trick in an online setting, and examines the value of large margins for classification in the online setting with a drifting target.
• Computer Science
ICML
• 2010
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
• Computer Science
J. Mach. Learn. Res.
• 2016
A new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications, and presents two different online kernel machine learning algorithms that apply the random Fourier features for approximating kernel functions.
• Computer Science
J. Mach. Learn. Res.
• 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
• Computer Science
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
• Computer Science
NIPS
• 2007
An algorithm is provided, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √T and log T and shows strong optimality of the algorithm.
• Computer Science
APPROX-RANDOM
• 2016
This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)\$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.