Second-Order Kernel Online Convex Optimization with Adaptive Sketching
@inproceedings{Calandriello2017SecondOrderKO, title={Second-Order Kernel Online Convex Optimization with Adaptive Sketching}, author={Daniele Calandriello and Alessandro Lazaric and Michal Valko}, booktitle={International Conference on Machine Learning}, year={2017} }
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $O(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $O(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss…
35 Citations
Efficient Second-Order Online Kernel Learning with Adaptive Embedding
- Computer ScienceNIPS
- 2017
This paper proposes PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space and achieves the logarithmic regret.
Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity
- Computer ScienceICML
- 2018
Meta-Frank-Wolfe is proposed, the first online projection-free algorithm that uses stochastic gradient estimates and a novel "lifting" framework for the online discrete submodular maximization is developed, which outperform current state-of-the-art techniques on various experiments.
Efficient online learning with kernels for adversarial large scale problems
- Computer ScienceNeurIPS
- 2019
The resulting algorithm is based on approximations of the Gaussian kernel through Taylor expansion that achieves for d-dimensional inputs a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity.
Dynamic Online Learning via Frank-Wolfe Algorithm
- Computer ScienceIEEE Transactions on Signal Processing
- 2021
This work proposes to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible, and establishes performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot.
Projection Free Dynamic Online Learning
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A projection-free scheme based on Frank-Wolfe is proposed, where instead of online gradient steps, the algorithm’s required information is relaxed to only noisy gradient estimates, i.e., partial feedback and the dynamic regret bounds are derived.
Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
- Computer ScienceCOLT
- 2019
BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.
Improved Kernel Alignment Regret Bound for Online Kernel Learning
- Computer ScienceArXiv
- 2022
An algorithm whose regret bound and computational complexity are better than previous re- sults is proposed, and a O ( 1 T p E [ A T ]) excess risk bound is obtained which improves the previous O (1 / √ T ) bound.
Efficient online learning with kernels for adversarial large scale problems
- Computer Science
- 2022
The algorithm is studied to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n α with α < 2 and improves the computational trade-off known for online kernel regression.
Projection Dual Averaging Based Second-order Online Learning
- Computer Science2022 IEEE International Conference on Data Mining (ICDM)
- 2022
This paper develops a second-order projection dual averaging based online learning (SPDA) method to effectively handle high-throughput streaming data and demonstrates the efficacy of the proposed algorithms on large-scale online learning tasks, including online binary and multi-class classification and online anomaly detection.
LEARNING WITH K ERNELS FOR ADVERSARIAL LARGE SCALE PROBLEMS
- Computer Science
- 2019
The algorithm is studied is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order n with α < 2, and improves the computational trade-off known for online kernel regression.
References
SHOWING 1-10 OF 31 REFERENCES
Logarithmic regret algorithms for online convex optimization
- Computer ScienceMachine Learning
- 2007
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Fast Randomized Kernel Methods With Statistical Guarantees
- Computer ScienceArXiv
- 2014
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training
- Computer ScienceJ. Mach. Learn. Res.
- 2012
Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.
Online learning with kernels
- Computer ScienceIEEE Transactions on Signal Processing
- 2004
This paper considers online learning in a reproducing kernel Hilbert space, and allows the exploitation of the kernel trick in an online setting, and examines the value of large margins for classification in the online setting with a drifting target.
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
- Computer ScienceICML
- 2010
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Large Scale Online Kernel Learning
- Computer ScienceJ. Mach. Learn. Res.
- 2016
A new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications, and presents two different online kernel machine learning algorithms that apply the random Fourier features for approximating kernel functions.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer ScienceJ. Mach. Learn. Res.
- 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Adaptive Online Gradient Descent
- Computer ScienceNIPS
- 2007
An algorithm is provided, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √T and log T and shows strong optimality of the algorithm.
Online Row Sampling
- Computer ScienceAPPROX-RANDOM
- 2016
This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.