An Algorithm for the Principal Component Analysis of Large Data Sets

@article{Halko2010AnAF,
  title={An Algorithm for the Principal Component Analysis of Large Data Sets},
  author={Nathan Halko and Per-Gunnar Martinsson and Yoel Shkolnisky and Mark Tygert},
  journal={SIAM J. Scientific Computing},
  year={2010},
  volume={33},
  pages={2580-2594}
}
  • Nathan Halko, Per-Gunnar Martinsson, +1 author Mark Tygert
  • Published in SIAM J. Scientific Computing 2010
  • Mathematics, Computer Science
  • Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy—even on parallel processors—unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 128 CITATIONS

    Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding

    VIEW 9 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Single-Pass PCA of Large High-Dimensional Data

    VIEW 18 EXCERPTS
    CITES METHODS, BACKGROUND & RESULTS
    HIGHLY INFLUENCED

    Greedy Representative Selection for Unsupervised Data Analysis

    VIEW 11 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Lazy Stochastic Principal Component Analysis

    VIEW 4 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Randomized Matrix Decompositions using

    VIEW 10 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Projecting "Better Than Randomly": How to Reduce the Dimensionality of Very Large Datasets in a Way That Outperforms Random Projections

    VIEW 8 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Randomized Matrix Decompositions using R

    VIEW 6 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Facial Expression Recognition and Analysis: A Comparison Study of Feature Descriptors

    VIEW 9 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Informative Data Fusion: Beyond Canonical Correlation Analysis

    VIEW 20 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Pass-Efficient Randomized Algorithms for Low-Rank Matrix Approximation Using Any Number of Views

    VIEW 7 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2011
    2020

    CITATION STATISTICS

    • 23 Highly Influenced Citations

    • Averaged 16 Citations per year from 2017 through 2019

    • 42% Increase in citations per year in 2019 over 2018

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 14 REFERENCES

    Matrix Computations

    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL

    Computing Steerable Principal Components of a Large Set of Images and Their Rotations

    VIEW 1 EXCERPT

    Normalized power iterations for the computation of SVD

    • P.-G. Martinsson, A. Szlam, M. Tygert
    • Proceedings of the Neural and Information Processing Systems (NIPS) Workshop on Low-Rank Methods for Large-Scale Machine Learning, Vancouver, Canada
    • 2011
    VIEW 2 EXCERPTS

    A Randomized Algorithm for Principal Component Analysis

    VIEW 3 EXCERPTS

    Randomized algorithms for the low-rank approximation of matrices.

    VIEW 1 EXCERPT