Permutation methods for factor analysis and PCA

@article{Dobriban2017PermutationMF,
  title={Permutation methods for factor analysis and PCA},
  author={E. Dobriban},
  journal={arXiv: Statistics Theory},
  year={2017}
}
  • E. Dobriban
  • Published 2 October 2017
  • Mathematics
  • arXiv: Statistics Theory
Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly… Expand

Figures from this paper

Selecting the number of components in PCA via random signflips.
Dimensionality reduction via PCA and factor analysis is an important tool of data analysis. A critical step is selecting the number of components. However, existing methods (such as the scree plot,Expand
Deterministic parallel analysis: An improved method for selecting the number of factors and principal components
Factor analysis and principal component analysis (PCA) are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improvedExpand
Deterministic parallel analysis: an improved method for selecting factors and principal components
  • E. Dobriban, A. Owen
  • Computer Science, Mathematics
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2018
TLDR
This work derandomizes parallel analysis, proposing deterministic PA, which is faster and more reproducible than PA, and proposes deflation to counter shadowing, and raises the decision threshold to improve estimation accuracy. Expand
Factor analysis in high dimensional biological data with dependent observations
TLDR
This work develops a novel statistical framework to perform factor analysis and interpret its results in data with dependent observations and factors whose signal strengths span several orders of magnitude, and shows that its estimator for the number of factors overcomes both the notorious "eigenvalue shadowing" problem and the biases due to the pervasive factor assumption. Expand
Robust high dimensional factor models with applications to statistical machine learning.
TLDR
It is shown that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference and illustrate through several applications how insights from these fields yield solutions to modern challenges. Expand
Estimating Number of Factors by Adjusted Eigenvalues Thresholding
Determining the number of common factors is an important and practical topic in high dimensional factor models. The existing literatures are mainly based on the eigenvalues of the covariance matrix.Expand
Likelihood Ratio Test in Multivariate Linear Regression: from Low to High Dimension
Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer suchExpand
Biwhitening Reveals the Rank of a Count Matrix
TLDR
This work proposes a simple procedure termed biwhitening that makes it possible to estimate the rank of the underlying data matrix without any prior knowledge on its structure, and extends it to other discrete distributions, such as the generalized Poisson, binomial, multinomial, and negative binomial. Expand
Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data
TLDR
CBCV and CorrConf are developed: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high dimensional data with correlated or nonexchangeable residuals. Expand
Estimation of large block structured covariance matrices: Application to ‘multi‐omic’ approaches to study seed quality
TLDR
This work proposes a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than thenumber of samples without limiting ourselves to block diagonal matrices. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 60 REFERENCES
Bi-cross-validation for factor analysis
Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. We provide a systematic review of current methods and then introduce a methodExpand
Deterministic parallel analysis: An improved method for selecting the number of factors and principal components
Factor analysis and principal component analysis (PCA) are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improvedExpand
Deterministic parallel analysis: an improved method for selecting factors and principal components
  • E. Dobriban, A. Owen
  • Computer Science, Mathematics
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2018
TLDR
This work derandomizes parallel analysis, proposing deterministic PA, which is faster and more reproducible than PA, and proposes deflation to counter shadowing, and raises the decision threshold to improve estimation accuracy. Expand
How many principal components? stopping rules for determining the number of non-trivial axes revisited
TLDR
A Bartlett's test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set, and a two-step approach appears to be highly effective. Expand
Remarks on Parallel Analysis.
TLDR
Evidence is given that quasi-inferential PA based on normal random variates (as opposed to data permutations) is surprisingly independent of distributional assumptions, and enjoys therefore certain non- parametric properties as well. Expand
The Elements of Statistical Learning
  • E. Ziegel
  • Computer Science, Mathematics
  • Technometrics
  • 2003
TLDR
Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Expand
Confirmatory Factor Analysis for Applied Research
Data Mining Methods and Models is the second volume of a three-book series on data mining authored by Larose. The following review was performed independently of LaRose’s other two books.Expand
Eigenvalue significance testing for genetic association.
TLDR
A novel block permutation approach is introduced, designed to produce an appropriate null eigen value distribution by eliminating long-range genomic correlation while preserving local correlation, and a fast approach based on eigenvalue distribution modeling is proposed. Expand
Finite sample approximation results for principal component analysis: a matrix perturbation approach
Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of $n$ observations (samples), each with $p$ variables. In this paper, using a matrix perturbation approach,Expand
TESTING HYPOTHESES ABOUT THE NUMBER OF FACTORS IN LARGE FACTOR MODELS
In this paper we study high-dimensional time series that have the generalized dynamic factor structure. We develop a test of the null of k 0 factors against the alternative that the number of factorsExpand
...
1
2
3
4
5
...