Parsimonious Online Learning with Kernels via sparse projections in function space

@article{Koppel2017ParsimoniousOL,
  title={Parsimonious Online Learning with Kernels via sparse projections in function space},
  author={Alec Koppel and Garrett Warnell and Ethan Stump and Alejandro Ribeiro},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2017},
  pages={4671-4675}
}
We consider stochastic nonparametric regression problems in a reproducing kernel Hilbert space (RKHS), an extension of expected risk minimization to nonlinear function estimation. Popular perception is that kernel methods are inapplicable to online settings, since the generalization of stochastic methods to kernelized function spaces require memory storage that is cubic in the iteration index (“the curse of kernelization”). We alleviate this intractability in two ways: (1) we consider the use… 

Nonparametric Compositional Stochastic Optimization for Risk-Sensitive Kernel Learning

TLDR
This work develops the first memory-efficient stochastic algorithm for this setting, and provides, for the first time, a non-asymptotic tradeoff between the complexity of a function parameterization and its required convergence accuracy for both strongly convex and non-convex objectives under constant step-sizes.

Sparse Representations of Positive Functions via Projected Pseudo-Mirror Descent

TLDR
Experiments demonstrate that the problem of expected risk minimization when the population loss is strongly convex and the target domain of the decision variable is required to be nonnegative is solved and state-of-the-art accuracy and complexity tradeoffs are achieved for inhomogeneous Poisson Process intensity estimation and multi-class kernel logistic regression.

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

TLDR
The theoretical analysis shows that this estimator obtains a rate-optimal generalization error when the regression function is known to live in a reproducing kernel Hilbert space, and it is shown that the computational cost of the estimator is much lower than that of other rate-Optimal estimators proposed for this online setting.

Nonparametric Compositional Stochastic Optimization

TLDR
This work develops the first memory-efficient stochastic algorithm for this setting, which it is called Compositional Online Learning with Kernels (COLK), and demonstrates that COLK reliably converges, attains consistent performance across training runs, and thus overcomes overfitting.

A Sieve Stochastic Gradient Descent Estimator for Online Nonparametric Regression in Sobolev ellipsoids

TLDR
A sieve stochastic gradient descent estimator (Sieve-SGD) when the hypothesis space is a Sobolev ellipsoid is proposed and it is shown that Sieve- SGD has rate-optimal mean squared error (MSE) under a set of simple and direct conditions.

Projected Pseudo-Mirror Descent in Reproducing Kernel Hilbert Space

TLDR
A variant of stochastic mirror descent is developed that employs pseudo-gradients and projections to solve expected risk minimization for strongly convex costs in the case that the decision variable belongs to a Reproducing Kernel Hilbert Space and its target domain must be non-negative.

Locally Adaptive Kernel Estimation Using Sparse Functional Programming

TLDR
This work proposes to locally adapt the RKHS (more specifically, its smoothness parameter) over which it seeks to perform function estimation by using a sparse functional program and must solve an infinite dimensional, non-convex optimization problem.

Decentralized efficient nonparametric stochastic optimization

TLDR
This work considers stochastic optimization problems defined over reproducing kernel Hilbert spaces, where a multi-agent network aims to learn decision functions that are optimal in terms of a global convex functional that aggregates data across the network, while only having access to locally observed sequentially available training examples.

Decentralized Online Learning With Kernels

TLDR
This work proposes an algorithm that allows each individual agent to learn a regression function that is close to the globally optimal regression function, and establishes that with constant step-size selections agents’ functions converge to a neighborhood of the global optimal one while satisfying the consensus constraints as the penalty parameter is increased.

Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony

TLDR
This work proposes a functional variant of online gradient descent operating in tandem with greedy subspace projections and establishes sublinear dynamic regret growth in terms of both loss variation and functional path length, and that the memory of the function sequence remains moderate.
...

References

SHOWING 1-10 OF 76 REFERENCES

Scalable Kernel Methods via Doubly Stochastic Gradients

TLDR
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

TLDR
Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.

Online learning with kernels

TLDR
This paper considers online learning in a reproducing kernel Hilbert space, and allows the exploitation of the kernel trick in an online setting, and examines the value of large margins for classification in the online setting with a drifting target.

Online Kernel Learning with a Near Optimal Sparsity Bound

TLDR
This work focuses on Online Sparse Kernel Learning that aims to online learn a kernel classifier with a bounded number of support vectors and shows promising performance of the proposed algorithm compared to the state-of-the-art algorithms for online sparse kernel learning.

Dual Space Gradient Descent for Online Learning

TLDR
The Dual Space Gradient Descent (DualSGD) is presented, a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance while simultaneously mitigating the impact of the dimensionality issue on learning performance.

Nonparametric Budgeted Stochastic Gradient Descent

TLDR
This paper proposes the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way and provides theoretical analysis to show that this framework is guaranteed to converge for a large collection of loss functions.

Non-parametric Stochastic Approximation with Large Step sizes

TLDR
In a stochastic approximation framework, it is shown that the averaged unregularized least-mean-square algorithm, given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.

Large Scale Online Kernel Learning

TLDR
A new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications, and presents two different online kernel machine learning algorithms that apply the random Fourier features for approximating kernel functions.

Online Prediction of Time Series Data With Kernels

TLDR
This paper investigates a new model reduction criterion that makes computationally demanding sparsification procedures unnecessary and incorporates the coherence criterion into a new kernel-based affine projection algorithm for time series prediction.

Error analysis for online gradient descent algorithms in reproducing kernel Hilbert spaces

TLDR
This work considers online gradient descent algorithms with general convex loss functions in reproducing kernel Hilbert spaces (RKHS) and provides general conditions ensuring convergence of the algorithm in the RKHS norm.
...