# Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model

@article{Mahdaviyeh2019RiskOT, title={Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model}, author={Yasaman Mahdaviyeh and Zacharie Naulet}, journal={arXiv: Machine Learning}, year={2019} }

We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $\frac{d}{n} \rightarrow \infty$. We assume that data has an underlying low rank structure by restricting ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with $n$ and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting risk of minimum norm least squares estimatorâ€¦Â

## 4 Citations

### Towards an Understanding of Benign Overfitting in Neural Networks

- Computer ScienceArXiv
- 2021

It is shown that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate, which to this knowledge is the first generalization result for such networks.

### Support vector machines and linear regression coincide with very high-dimensional features

- Computer ScienceNeurIPS
- 2021

A super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models is proved, matching the upper bounds from previous works.

### On the proliferation of support vectors in high dimensions

- Computer ScienceAISTATS
- 2021

This paper identifies new deterministic equivalences for this phenomenon of support vector proliferation, and uses them to substantially broaden the conditions under which the phenomenon occurs in high-dimensional settings, and proves a nearly matching converse result.

### Classification vs regression in overparameterized regimes: Does the loss function matter?

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This work compares classification and regression tasks in the overparameterized linear model with Gaussian features and demonstrates the very different roles and properties of loss functions used at the training phase (optimization) and the testing phase (generalization).

## References

SHOWING 1-10 OF 22 REFERENCES

### Benign overfitting in linear regression

- Computer ScienceProceedings of the National Academy of Sciences
- 2020

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

### Distance-based and continuum Fano inequalities with applications to statistical estimation

- MathematicsArXiv
- 2013

Two extensions of the classical Fano inequality in information theory are given, providing lower bounds on the probability that an estimator of a discrete quantity is within some distance $t$ of the quantity.

### Two models of double descent for weak features

- Computer ScienceSIAM J. Math. Data Sci.
- 2020

The "double descent" risk curve was recently proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models and it is shown that the risk peaks when the number of features is close to the sample size, but also that therisk decreases towards its minimum as $p$ increases beyond $n$.

### Asymptotics and Concentration Bounds for Bilinear Forms of Spectral Projectors of Sample Covariance

- Mathematics
- 2014

Let $X,X_1,\dots, X_n$ be i.i.d. Gaussian random variables with zero mean and covariance operator $\Sigma={\mathbb E}(X\otimes X)$ taking values in a separable Hilbert space ${\mathbb H}.$ Let $$â€¦

### The high-dimension, lowsample-size geometric representation holds under mild conditions

- 2007

### On the limit of the largest eigenvalue of the large dimensional sample covariance matrix

- Mathematics
- 1988

SummaryIn this paper the authors show that the largest eigenvalue of the sample covariance matrix tends to a limit under certain conditions when both the number of variables and the sample size tendâ€¦

### The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

- Computer Science, MathematicsStatistica Sinica
- 2016

The new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity.

### Geometric representation of high dimension, low sample size data

- Computer Science
- 2005

This analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex, which means all the randomness in the data appears only as a random rotation of this simplex.

### A General Framework for Consistency of Principal Component Analysis

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This frame- work includes several previously studied domains of asymptotics as special cases and allows one to investigate interesting connections and transitions among the various domains and rigorously characterizes how their relationships affect PCA consistency.