• Corpus ID: 8480143

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

  title={Fast Randomized Kernel Ridge Regression with Statistical Guarantees},
  author={A. El Kacimi Alaoui and Michael W. Mahoney},
One approach to improving the running time of kernel-based methods is to build a small sketch of the kernel matrix and use it in lieu of the full matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical leverage scores to the setting of kernel ridge regression, we are able to identify a sampling distribution that… 

Figures and Tables from this paper

Fast Statistical Leverage Score Approximation in Kernel Ridge Regression

A linear time (modulo polylog terms) algorithm is proposed to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees and is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.

Faster Kernel Ridge Regression Using Sketching and Preconditioning

This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.

Spectrally-truncated kernel ridge regression and its free lunch

  • A. Amini
  • Computer Science, Mathematics
    Electronic Journal of Statistics
  • 2021
It is shown that, as long as the RKHS is infinite-dimensional, there is a threshold on r, above which, the spectrally-truncated KRR, surprisingly, outperforms the full KRR in terms of the minimax risk, where the minimum is taken over the regularization parameter.

Learning Theory for Distribution Regression

This paper studies a simple, analytically computable, ridge regression-based alternative to distribution regression, where the distributions are embedded to a reproducing kernel Hilbert space, and the regressor is learned from the embeddings to the outputs, establishing the consistency of the classical set kernel.

Risk Convergence of Centered Kernel Ridge Regression With Large Dimensional Data

A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk, which allows to optimally tune the design parameters.

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance.

Towards a Unified Analysis of Random Fourier Features

  • ZhuLi
  • Computer Science
  • 2019
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

Provably Useful Kernel Matrix Approximation in Linear Time

We give the first algorithms for kernel matrix approximation that run in time linear in the number of data points and output an approximation which gives provable guarantees when used in many

Diversity sampling is an implicit regularization for kernel methods

If the dataset has a dense bulk and a sparser tail, it is shown that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling.

Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data

A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk, and this insight is validated with synthetic data.



Fast Randomized Kernel Methods With Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.

Sharp analysis of low-rank kernel matrix approximations

This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.

Divide and Conquer Kernel Ridge Regression

The main theorem establishes that despite the computational speed-up, statistical optimality is retained: if m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N.

Fast approximation of matrix coherence and statistical leverage

A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.

Revisiting the Nystrom Method for Improved Large-scale Machine Learning

An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

Efficient SVM Training Using Low-Rank Kernel Representations

This work shows that for a low rank kernel matrix it is possible to design a better interior point method (IPM) in terms of storage requirements as well as computational complexity and derives an upper bound on the change in the objective function value based on the approximation error and the number of active constraints (support vectors).

Randomized Algorithms for Matrices and Data

This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.

Fast Monte-Carlo algorithms for finding low-rank approximations

  • A. FriezeR. KannanS. Vempala
  • Computer Science
    Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)
  • 1998
This paper develops an algorithm which is qualitatively faster provided the entries of the matrix are sampled according to a natural probability distribution and the algorithm takes time polynomial in k, 1//spl epsiv/, log(1//spl delta/) only, independent of m, n.

Sampling Techniques for the Nystrom Method

This work presents novel experiments with several real world datasets, and suggests that uniform sampling without replacement, in addition to being more efficient both in time and space, produces more effective approximations.

Relative-Error CUR Matrix Decompositions

These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.