Estimation of the Number of Spiked Eigenvalues in a Covariance Matrix by Bulk Eigenvalue Matching Analysis

  title={Estimation of the Number of Spiked Eigenvalues in a Covariance Matrix by Bulk Eigenvalue Matching Analysis},
  author={Zheng Tracy Ke and Yucong Ma and Xihong Lin},
  journal={Journal of the American Statistical Association},
The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, $K$. For estimation of $K$, most attention has focused on the use of $top$ eigenvalues of sample covariance matrix, and there is little investigation into proper ways of utilizing $bulk$ eigenvalues to estimate $K$. We propose a principled approach to incorporating bulk eigenvalues in the estimation of $K$. Our method imposes… 
Statistical inference for principal components of spiked covariance matrices
In this paper, we study the asymptotic behavior of the extreme eigenvalues and eigenvectors of the high dimensional spiked sample covariance matrices, in the supercritical case when a reliable
Selecting the number of components in PCA via random signflips.
The Signflip Parallel Analysis (Signflip PA) method is proposed: it compares data singular values to those of "empirical null" data generated by flipping the sign of each entry randomly with probability one-half, and consistently selects factors above the noise level in high-dimensional signal-plus-noise models under heterogeneous settings.
Singular value distribution of dense random matrices with block Markovian dependence
. A block Markov chain is a Markov chain whose state space can be partitioned into a finite number of clusters such that the transition probabilities only depend on the clusters. Block Markov chains
A universal test on spikes in a high-dimensional generalized spiked model and its applications
This paper aims to test the number of spikes in a generalized spiked covariance matrix, the spiked eigenvalues of which may be extremely larger or smaller than the non-spiked ones. For a
Testing the number of common factors by bootstrap in high-dimensional factor models
This paper provides asymptotic distributions for the eigenvalues of bootstrapped sample covariance matrix under mild conditions and proposes two testing schemes based on the disparate behavior of the spiked and non-spiked eigen values.
Two-stage Linked Component Analysis for Joint Decomposition of Multiple Biologically Related Data Sets
A method called two-stage linked component analysis (2s-LCA) is proposed to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition.
The asymptotic behavior of the extreme eigenvalues and eigenvectors of the high dimensional spiked sample covariance matrices is studied, in the supercritical case when a reliable detection of spikes is possible.
The Eigenvectors of Single-spiked Complex Wishart Matrices: Finite and Asymptotic Analyses
It turns out that, in this asymptotic regime, the scaled random variable nZ 1 converges in distribution to χ 2 2/2(1 + θ), where χ2 denotes a chi-squared random variable with two degrees of freedom, which reveals that u1 can be used to infer information about the spike.
The conjugate gradient algorithm on a general class of spiked covariance matrices
The main result of the paper is that the norms of the error and residual vectors at any finite step concentrate on deterministic values determined by orthogonal polynomials with respect to a deformed Marchenko–Pastur law.
Distribution of the Scaled Condition Number of Single-spiked Complex Wishart Matrices
This paper uses an orthogonal polynomial approach to derive an exact expression for the probability density function of κSC(X) which is amenable to asymptotic analysis as matrix dimensions grow large and establishes simple closed-form expressions for the limiting distributions.


Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model.
It is shown that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation of the population covariance matrix and design of an optimal shrinker η that acts elementwise on the sample eigenvalues is required.
Large covariance estimation by thresholding principal orthogonal complements
It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases, and the principal orthogonal complement thresholding method ‘POET’ is introduced to explore such an approximate factor structure with sparsity.
Efficient Computation of Limit Spectra of Sample Covariance Matrices
The method, called Spectrode, finds the support and the density of the ESD to high precision; it proves this for finite discrete distributions and may make it more convenient to use asymptotic RMT in aspects of high-dimensional data analysis.
Estimating Number of Factors by Adjusted Eigenvalues Thresholding
This paper proposes a tuning-free scale-invariant adjusted correlation thresholding (ACT) method for determining the number of common factors in high-dimensional factor models, taking into account the sampling variabilities and biases of top sample eigenvalues and establishes the optimality of the proposed method in terms of minimal signal strength and the optimal threshold.
Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices
We study the asymptotic distributions of the spiked eigenvalues and the largest nonspiked eigenvalue of the sample covariance matrix under a general covariance matrix model with divergent spiked
This paper deals with a multivariate Gaussian observation model where the eigenvalues of the covariance matrix are all one, except for a finite number which are larger. Of interest is the asymptotic
Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
AbstractWe compute the limiting distributions of the largest eigenvalue of a complex Gaussian samplecovariance matrix when both the number of samples and the number of variables in each samplebecome
On the distribution of the largest eigenvalue in principal components analysis
Let x (1) denote the square of the largest singular value of an n x p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component
Universality of covariance matrices
In this paper we prove the universality of covariance matrices of the form $H_{N\times N}={X}^{\dagger}X$ where $X$ is an ${M\times N}$ rectangular matrix with independent real valued entries
Influential Feature PCA for high dimensional clustering
In IF-PCA, a small fraction of features with the largest Kolmogorov-Smirnov (KS) scores are selected, where the threshold is chosen by adapting the recent notion of Higher Criticism, and the labels are estimated by applying the classical k-means to these singular vectors.