# Permutation methods for factor analysis and PCA

```@article{Dobriban2017PermutationMF,
title={Permutation methods for factor analysis and PCA},
author={E. Dobriban},
journal={arXiv: Statistics Theory},
year={2017}
}```
• E. Dobriban
• Published 2 October 2017
• Mathematics
• arXiv: Statistics Theory
Researchers often have datasets measuring features \$x_{ij}\$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly… Expand
Selecting the number of components in PCA via random signflips.
• Mathematics
• 2020
Dimensionality reduction via PCA and factor analysis is an important tool of data analysis. A critical step is selecting the number of components. However, existing methods (such as the scree plot,Expand
Deterministic parallel analysis: An improved method for selecting the number of factors and principal components
• Mathematics
• 2017
Factor analysis and principal component analysis (PCA) are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improvedExpand
Deterministic parallel analysis: an improved method for selecting factors and principal components
• Computer Science, Mathematics
• Journal of the Royal Statistical Society: Series B (Statistical Methodology)
• 2018
This work derandomizes parallel analysis, proposing deterministic PA, which is faster and more reproducible than PA, and proposes deflation to counter shadowing, and raises the decision threshold to improve estimation accuracy. Expand
Factor analysis in high dimensional biological data with dependent observations
This work develops a novel statistical framework to perform factor analysis and interpret its results in data with dependent observations and factors whose signal strengths span several orders of magnitude, and shows that its estimator for the number of factors overcomes both the notorious "eigenvalue shadowing" problem and the biases due to the pervasive factor assumption. Expand
Robust high dimensional factor models with applications to statistical machine learning.
• Medicine, Computer Science
• Statistical science : a review journal of the Institute of Mathematical Statistics
• 2021
It is shown that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference and illustrate through several applications how insights from these fields yield solutions to modern challenges. Expand
Estimating Number of Factors by Adjusted Eigenvalues Thresholding
• Mathematics
• 2019
Determining the number of common factors is an important and practical topic in high dimensional factor models. The existing literatures are mainly based on the eigenvalues of the covariance matrix.Expand
Likelihood Ratio Test in Multivariate Linear Regression: from Low to High Dimension
• Mathematics
• Statistica Sinica
• 2021
Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer suchExpand
Biwhitening Reveals the Rank of a Count Matrix
• Computer Science, Mathematics
• ArXiv
• 2021
This work proposes a simple procedure termed biwhitening that makes it possible to estimate the rank of the underlying data matrix without any prior knowledge on its structure, and extends it to other discrete distributions, such as the generalized Poisson, binomial, multinomial, and negative binomial. Expand
Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data
• Mathematics, Computer Science
• 2018
CBCV and CorrConf are developed: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high dimensional data with correlated or nonexchangeable residuals. Expand
Estimation of large block structured covariance matrices: Application to ‘multi‐omic’ approaches to study seed quality
• Mathematics, Computer Science
• Journal of the Royal Statistical Society: Series C (Applied Statistics)
• 2021
This work proposes a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than thenumber of samples without limiting ourselves to block diagonal matrices. Expand