• Corpus ID: 214117204

Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model

@article{Mahdaviyeh2019RiskOT,
  title={Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model},
  author={Yasaman Mahdaviyeh and Zacharie Naulet},
  journal={arXiv: Machine Learning},
  year={2019}
}
We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $\frac{d}{n} \rightarrow \infty$. We assume that data has an underlying low rank structure by restricting ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with $n$ and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting risk of minimum norm least squares estimator… 

Towards an Understanding of Benign Overfitting in Neural Networks

It is shown that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate, which to this knowledge is the first generalization result for such networks.

Support vector machines and linear regression coincide with very high-dimensional features

A super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models is proved, matching the upper bounds from previous works.

On the proliferation of support vectors in high dimensions

This paper identifies new deterministic equivalences for this phenomenon of support vector proliferation, and uses them to substantially broaden the conditions under which the phenomenon occurs in high-dimensional settings, and proves a nearly matching converse result.

Classification vs regression in overparameterized regimes: Does the loss function matter?

This work compares classification and regression tasks in the overparameterized linear model with Gaussian features and demonstrates the very different roles and properties of loss functions used at the training phase (optimization) and the testing phase (generalization).

References

SHOWING 1-10 OF 22 REFERENCES

Benign overfitting in linear regression

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

Distance-based and continuum Fano inequalities with applications to statistical estimation

Two extensions of the classical Fano inequality in information theory are given, providing lower bounds on the probability that an estimator of a discrete quantity is within some distance $t$ of the quantity.

Two models of double descent for weak features

The "double descent" risk curve was recently proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models and it is shown that the risk peaks when the number of features is close to the sample size, but also that therisk decreases towards its minimum as $p$ increases beyond $n$.

Asymptotics and Concentration Bounds for Bilinear Forms of Spectral Projectors of Sample Covariance

Let $X,X_1,\dots, X_n$ be i.i.d. Gaussian random variables with zero mean and covariance operator $\Sigma={\mathbb E}(X\otimes X)$ taking values in a separable Hilbert space ${\mathbb H}.$ Let $$

The high-dimension, lowsample-size geometric representation holds under mild conditions

  • 2007

On the limit of the largest eigenvalue of the large dimensional sample covariance matrix

SummaryIn this paper the authors show that the largest eigenvalue of the sample covariance matrix tends to a limit under certain conditions when both the number of variables and the sample size tend

The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

The new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity.

Geometric representation of high dimension, low sample size data

This analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex, which means all the randomness in the data appears only as a random rotation of this simplex.

A General Framework for Consistency of Principal Component Analysis

This frame- work includes several previously studied domains of asymptotics as special cases and allows one to investigate interesting connections and transitions among the various domains and rigorously characterizes how their relationships affect PCA consistency.