# Faster Kernel Ridge Regression Using Sketching and Preconditioning

@article{Avron2017FasterKR, title={Faster Kernel Ridge Regression Using Sketching and Preconditioning}, author={Haim Avron and Kenneth L. Clarkson and David P. Woodruff}, journal={SIAM J. Matrix Anal. Appl.}, year={2017}, volume={38}, pages={1116-1138} }

Kernel ridge regression is a simple yet powerful technique for nonparametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on…

## 82 Citations

Fast and Accurate Gaussian Kernel Ridge Regression Using Matrix Decompositions for Preconditioning

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2021

The suggested approach is based on randomized matrix decomposition methods, combined with the fast multipole method to achieve an algorithm that can process large datasets in complexity linear to the number of data points.

Sharper Bounds for Regression and Low-Rank Approximation with Regularization

- Computer Science, MathematicsArXiv
- 2016

Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization.

An Iterative, Sketching-based Framework for Ridge Regression

- Computer Science, MathematicsICML
- 2018

It is proved that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix, which is a fundamental and wellunderstood primitive of randomized linear algebra.

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

- Computer ScienceICML
- 2020

A near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation, and shows how its subspace embedding bounds imply new statistical guarantees for kernel ridge regression.

Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector Multiplication

- Computer ScienceFoundations of Data Science
- 2022

This work proposes the use of an ANOVA kernel, where several kernels are constructed based on lower-dimensional feature spaces for which the non-equispaced fast Fourier transform (NFFT) is employed, which is of linear complexity for fixed accuracy.

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

- Computer ScienceAISTATS
- 2020

A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression.

Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations

- Computer ScienceArXiv
- 2021

The detailed analysis of the interaction among their algorithmic components unveils a particularly efficient framework and indeed, the presented experimental results demonstrate a significant speed-up when compared to the state-of-the-art nonlinear SVM libraries (without significantly affecting the classification accuracy).

Scalable and Memory-Efficient Kernel Ridge Regression

- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020

A scalable and memory-efficient framework for kernel ridge regression that relies on a hierarchy of low-rank factorizations of tunable accuracy and provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated)kernel ridge regression method.

Oblivious Sketching of High-Degree Polynomial Kernels

- Computer Science, MathematicsSODA
- 2020

This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.

Kernel methods through the roof: handling billions of points efficiently

- Computer ScienceNeurIPS
- 2020

This work designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization.

## References

SHOWING 1-10 OF 56 REFERENCES

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

- Computer ScienceNIPS
- 2015

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

Sharp analysis of low-rank kernel matrix approximations

- Computer ScienceCOLT
- 2013

This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.

Sharper Bounds for Regression and Low-Rank Approximation with Regularization

- Computer Science, MathematicsArXiv
- 2016

Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization.

Scalable Kernel Methods via Doubly Stochastic Gradients

- Computer ScienceNIPS
- 2014

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Subspace Embeddings for the Polynomial Kernel

- Computer ScienceNIPS
- 2014

This work proposes the first fast oblivious subspace embeddings that are able to embed a space induced by a non-linear kernel without explicitly mapping the data to the high-dimensional space.

Randomized sketches for kernels: Fast and optimal non-parametric regression

- Computer Science, MathematicsArXiv
- 2015

It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained.

Preconditioned Krylov solvers for kernel regression

- Computer ScienceArXiv
- 2014

A novel flexible preconditioner that not only improves convergence but also allows utilization of fast kernel matrix-vector products is introduced.

Fast large scale Gaussian process regression using approximate matrix-vector products

- Computer Science
- 2006

This work considers the use of 2-exact matrix-vector product algorithms to reduce the computational complexity of Gaussian processes to O(N), and shows how to choose 2 to guarantee the convergence of the iterative methods.

Preconditioning Kernel Matrices

- Computer ScienceICML
- 2016

A scalable approach to both solving kernel machines and learning their hyperparameters is described, and it is shown this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.

Optimal learning rates for Kernel Conjugate Gradient regression

- Computer ScienceNIPS
- 2010

We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early…