# Fast Randomized Kernel Ridge Regression with Statistical Guarantees

@inproceedings{Alaoui2015FastRK, title={Fast Randomized Kernel Ridge Regression with Statistical Guarantees}, author={A. El Kacimi Alaoui and Michael W. Mahoney}, booktitle={NIPS}, year={2015} }

One approach to improving the running time of kernel-based methods is to build a small sketch of the kernel matrix and use it in lieu of the full matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical leverage scores to the setting of kernel ridge regression, we are able to identify a sampling distribution that…

## 230 Citations

### Fast Statistical Leverage Score Approximation in Kernel Ridge Regression

- Computer ScienceAISTATS
- 2021

A linear time (modulo polylog terms) algorithm is proposed to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees and is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.

### Faster Kernel Ridge Regression Using Sketching and Preconditioning

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2017

This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.

### Spectrally-truncated kernel ridge regression and its free lunch

- Computer Science, MathematicsElectronic Journal of Statistics
- 2021

It is shown that, as long as the RKHS is infinite-dimensional, there is a threshold on r, above which, the spectrally-truncated KRR, surprisingly, outperforms the full KRR in terms of the minimax risk, where the minimum is taken over the regularization parameter.

### Learning Theory for Distribution Regression

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2016

This paper studies a simple, analytically computable, ridge regression-based alternative to distribution regression, where the distributions are embedded to a reproducing kernel Hilbert space, and the regressor is learned from the embeddings to the outputs, establishing the consistency of the classical set kernel.

### Risk Convergence of Centered Kernel Ridge Regression With Large Dimensional Data

- Computer ScienceIEEE Transactions on Signal Processing
- 2020

A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk, which allows to optimally tune the design parameters.

### Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

- Computer ScienceICML
- 2017

The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance.

### Towards a Unified Analysis of Random Fourier Features

- Computer Science
- 2019

This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

### Provably Useful Kernel Matrix Approximation in Linear Time

- Computer ScienceArXiv
- 2016

We give the first algorithms for kernel matrix approximation that run in time linear in the number of data points and output an approximation which gives provable guarantees when used in many…

### Diversity sampling is an implicit regularization for kernel methods

- Computer ScienceSIAM J. Math. Data Sci.
- 2021

If the dataset has a dense bulk and a sparser tail, it is shown that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling.

### Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data

- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk, and this insight is validated with synthetic data.

## References

SHOWING 1-10 OF 24 REFERENCES

### Fast Randomized Kernel Methods With Statistical Guarantees

- Computer ScienceArXiv
- 2014

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.

### Sharp analysis of low-rank kernel matrix approximations

- Computer ScienceCOLT
- 2013

This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.

### Divide and Conquer Kernel Ridge Regression

- Computer Science, MathematicsCOLT
- 2013

The main theorem establishes that despite the computational speed-up, statistical optimality is retained: if m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N.

### Fast approximation of matrix coherence and statistical leverage

- Computer ScienceICML
- 2012

A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.

### Revisiting the Nystrom Method for Improved Large-scale Machine Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2016

An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

### Efficient SVM Training Using Low-Rank Kernel Representations

- Computer ScienceJ. Mach. Learn. Res.
- 2001

This work shows that for a low rank kernel matrix it is possible to design a better interior point method (IPM) in terms of storage requirements as well as computational complexity and derives an upper bound on the change in the objective function value based on the approximation error and the number of active constraints (support vectors).

### Randomized Algorithms for Matrices and Data

- Computer ScienceFound. Trends Mach. Learn.
- 2011

This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.

### Fast Monte-Carlo algorithms for finding low-rank approximations

- Computer ScienceProceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)
- 1998

This paper develops an algorithm which is qualitatively faster provided the entries of the matrix are sampled according to a natural probability distribution and the algorithm takes time polynomial in k, 1//spl epsiv/, log(1//spl delta/) only, independent of m, n.

### Sampling Techniques for the Nystrom Method

- Computer ScienceAISTATS
- 2009

This work presents novel experiments with several real world datasets, and suggests that uniform sampling without replacement, in addition to being more efficient both in time and space, produces more effective approximations.

### Relative-Error CUR Matrix Decompositions

- Computer Science, MathematicsSIAM J. Matrix Anal. Appl.
- 2008

These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.