• Corpus ID: 1588555

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

@inproceedings{Avron2017RandomFF,
  title={Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees},
  author={Haim Avron and Mikhail Kapralov and Cameron Musco and Christopher Musco and Ameya Velingker and Amir Zandieh},
  booktitle={ICML},
  year={2017}
}
Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral… 

Figures from this paper

Towards a Unified Analysis of Random Fourier Features
  • Zhu, Li
  • Computer Science
  • 2019
TLDR
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.
Towards a Unified Analysis of Random Fourier Features
TLDR
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.
Fast Statistical Leverage Score Approximation in Kernel Ridge Regression
TLDR
A linear time (modulo polylog terms) algorithm is proposed to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees and is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.
A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples n, their dimension p, and the dimension of
Low-Precision Random Fourier Features
TLDR
It is proved that LP-RFFs can match the generalization performance of both full-precision random Fourier features and the Nyström method on four classification and regression tasks, while using 5x-10x and 50x-460x less memory, respectively.
TRF: Learning Kernels with Tuned Random Features
TLDR
This paper proposes selecting the density function from a reproducing kernel Hilbert space to allow us to search the space of all translation-invariant kernels, resulting in a RFF formulation where kernel selection is reduced to regularised risk minimisation with a novel regulariser.
Scaling up Kernel Ridge Regression via Locality Sensitive Hashing
TLDR
A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression.
A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space
TLDR
A novel second-order algorithm that enjoys local superlinear convergence and global convergence in the high probability sense is provided, showing that the approximated Hessian via random features preserves the spectrum of the original Hessian.
Gauss-Legendre Features for Gaussian Process Regression
TLDR
This paper presents a Gauss-Legendre quadrature based approach for scaling up Gaussian process regression via a low rank approximation of the kernel matrix, which is very much inspired by the well-known random Fourier features approach.
On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees
TLDR
An out-of-sample performance bound is established on empirical leverage scores, revealing an interesting trade-off between the approximated kernel and the eigenvalue decay of another kernel in the domain of random features defined based on data distribution.
...
...

References

SHOWING 1-10 OF 22 REFERENCES
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
TLDR
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.
Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
TLDR
It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.
Faster Kernel Ridge Regression Using Sketching and Preconditioning
TLDR
This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.
Sharp analysis of low-rank kernel matrix approximations
TLDR
This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.
On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
  • F. Bach
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2017
TLDR
Theoretical analysis of the number of required samples for a given approximation error leads to both upper and lower bounds that are based solely on the eigenvalues of the associated integral operator and match up to logarithmic terms.
Generalization Properties of Learning with Random Features
TLDR
The results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties.
Input Sparsity Time Low-rank Approximation via Ridge Leverage Score Sampling
We present a new algorithm for finding a near optimal low-rank approximation of a matrix $A$ in $O(nnz(A))$ time. Our method is based on a recursive sampling scheme for computing a representative
Less is More: Nyström Computational Regularization
TLDR
A simple incremental variant of Nystrom Kernel Regularized Least Squares is suggested, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations.
Optimal Rates for the Regularized Least-Squares Algorithm
TLDR
A complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel.
Random Features for Large-Scale Kernel Machines
TLDR
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
...
...