Corpus ID: 219558568

# A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent

@article{Liao2020ARM,
title={A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent},
author={Zhenyu Liao and Romain Couillet and Michael W. Mahoney},
journal={ArXiv},
year={2020},
volume={abs/2006.05013}
}
• Published 2020
• Mathematics, Computer Science, Physics
• ArXiv
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when $N \to \infty$ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides… Expand

#### Figures and Topics from this paper

Good linear classifiers are abundant in the interpolating regime
• Mathematics, Computer Science
• ArXiv
• 2020
The results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning offer a promising alternative. Expand
Benign Overfitting in Binary Classification of Gaussian Mixtures
• Computer Science
• ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2021
This paper studies benign-overfitting for data generated from a popular binary Gaussian mixtures model (GMM) and classifiers trained by support-vector machines (SVM) to derive novel non-asymptotic bounds on the classification error of LS solution. Expand
Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model
A rigorous formula is proved for the asymptotic training loss and generalisation error achieved by empirical risk minimization for the high-dimensional Gaussian covariate model used in teacher-student models. Expand
Taxonomizing local versus global structure in neural network loss landscapes
This work performs a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data, and develops a simple one-dimensional model with load-like and temperature-like parameters and introduces the notion of an effective loss landscape depending on these parameters. Expand
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
• Computer Science, Mathematics
• ArXiv
• 2021
This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory. Expand
Asymptotics of Ridge Regression in Convolutional Models
• Computer Science, Mathematics
• ICML
• 2021
This work analyzes the asymptotics of estimation error in ridge estimators for convolutional linear models, and derives exact formulae for estimation error of ridge estimator that hold in a certain high-dimensional regime. Expand
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
• Computer Science, Mathematics
• ArXiv
• 2021
A nonlinear Hanson-Wright inequality suitable for neural networks with random weights and Lipschitz activation functions is provided, and it is verified the random feature regression achieves the same asymptotic performance as its limiting kernel regression in ultra-width limit. Expand
Double-descent curves in neural networks: a new perspective using Gaussian processes
• Mathematics, Computer Science
• ArXiv
• 2021
It is argued that neural network generalization performance improves in the overparameterised regime precisely because that is where they converge to their equivalent Gaussian process. Expand
Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models
• Computer Science, Mathematics
• ArXiv
• 2021
It is shown that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly, and a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) is proposed which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters. Expand
Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
• Computer Science, Mathematics
• ArXiv
• 2021
This work unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization, and shows the existence of a transition in the noisy setting between the noiseless exponents to its noisy values as the sample complexity is increased. Expand

#### References

SHOWING 1-10 OF 66 REFERENCES
High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification
• Mathematics
• 2015
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime whereExpand
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
• Computer Science, Mathematics
• ICML
• 2017
The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. Expand
A Model of Double Descent for High-dimensional Binary Linear Classification
• Mathematics, Computer Science
• ArXiv
• 2019
A model for logistic regression where only a subset of features of size p is used for training a linear classifier over n training samples is considered, and a phase-transition phenomenon for the case of Gaussian regressors is uncovered. Expand
Eigenvalue distribution of nonlinear models of random matrices
• Mathematics, Computer Science
• ArXiv
• 2019
The asymptotic empirical eigenvalue distribution of a non linear random matrix ensemble is computed in the case where W and X have subGaussian tails and f is smooth and investigated in the multi-layer case, regarding neural network applications. Expand
Nonlinear random matrix theory for deep learning
• Computer Science
• NIPS
• 2017
This work demonstrates that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method, and identifies an intriguing new class of activation functions with favorable properties. Expand
Exact expressions for double descent and implicit regularization via surrogate random design
• Computer Science, Mathematics
• NeurIPS
• 2020
This work provides the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator and introduces a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation. Expand
On the Spectrum of Random Features Maps of High Dimensional Data
• Mathematics, Computer Science
• ICML
• 2018
The "concentration" phenomenon induced by random matrix theory is leveraged to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Expand
Generalization Properties of Learning with Random Features
• Computer Science, Mathematics
• NIPS
• 2017
The results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties. Expand
The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
• Mathematics
• 2019
Deep learning methods operate in regimes that defy the traditional statistical mindset. The neural network architectures often contain more parameters than training samples, and are so rich that theyExpand
Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices
• Mathematics
• 2019
The present work provides an original framework for random matrix analysis based on revisiting the concentration of measure theory for random vectors. By providing various notions of vectorExpand