Faster Kernel Ridge Regression Using Sketching and Preconditioning

@article{Avron2017FasterKR,
  title={Faster Kernel Ridge Regression Using Sketching and Preconditioning},
  author={Haim Avron and Kenneth L. Clarkson and David P. Woodruff},
  journal={SIAM J. Matrix Anal. Appl.},
  year={2017},
  volume={38},
  pages={1116-1138}
}
Kernel ridge regression is a simple yet powerful technique for nonparametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on… Expand
Fast and Accurate Gaussian Kernel Ridge Regression Using Matrix Decompositions for Preconditioning
TLDR
The suggested approach is based on randomized matrix decomposition methods, combined with the fast multipole method to achieve an algorithm that can process large datasets in complexity linear to the number of data points. Expand
Sharper Bounds for Regression and Low-Rank Approximation with Regularization
TLDR
Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization. Expand
An Iterative, Sketching-based Framework for Ridge Regression
TLDR
It is proved that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix, which is a fundamental and wellunderstood primitive of randomized linear algebra. Expand
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
TLDR
The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. Expand
Scaling up Kernel Ridge Regression via Locality Sensitive Hashing
TLDR
A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression. Expand
Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations
TLDR
The detailed analysis of the interaction among their algorithmic components unveils a particularly efficient framework and indeed, the presented experimental results demonstrate a significant speed-up when compared to the state-of-the-art nonlinear SVM libraries (without significantly affecting the classification accuracy). Expand
Scalable and Memory-Efficient Kernel Ridge Regression
TLDR
A scalable and memory-efficient framework for kernel ridge regression that relies on a hierarchy of low-rank factorizations of tunable accuracy and provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated)kernel ridge regression method. Expand
Regularized Momentum Iterative Hessian Sketch for Large Scale Linear System of Equations
In this article, Momentum Iterative Hessian Sketch (M-IHS) techniques, a group of solvers for large scale linear Least Squares (LS) problems, are proposed and analyzed in detail. The proposedExpand
Sharper Bounds for Regularized Data Fitting
TLDR
This work studies matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis by obtaining sketching-based algorithms for the low-rank approximation problem. Expand
FALKON: An Optimal Large Scale Kernel Method
TLDR
This paper proposes FALKON, a novel algorithm that allows to efficiently process millions of points, derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
TLDR
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples. Expand
Sharp analysis of low-rank kernel matrix approximations
  • F. Bach
  • Mathematics, Computer Science
  • COLT
  • 2013
TLDR
This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods. Expand
Sharper Bounds for Regression and Low-Rank Approximation with Regularization
TLDR
Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization. Expand
Scalable Kernel Methods via Doubly Stochastic Gradients
TLDR
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets. Expand
Subspace Embeddings for the Polynomial Kernel
TLDR
This work proposes the first fast oblivious subspace embeddings that are able to embed a space induced by a non-linear kernel without explicitly mapping the data to the high-dimensional space. Expand
Randomized sketches for kernels: Fast and optimal non-parametric regression
TLDR
It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained. Expand
Preconditioned Krylov solvers for kernel regression
TLDR
A novel flexible preconditioner that not only improves convergence but also allows utilization of fast kernel matrix-vector products is introduced. Expand
Preconditioning Kernel Matrices
TLDR
A scalable approach to both solving kernel machines and learning their hyperparameters is described, and it is shown this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget. Expand
Optimal learning rates for Kernel Conjugate Gradient regression
We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by earlyExpand
Sharper Bounds for Regularized Data Fitting
TLDR
This work studies matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis by obtaining sketching-based algorithms for the low-rank approximation problem. Expand
...
1
2
3
4
5
...