Parallel GPU Implementation of Iterative PCA Algorithms

@article{Andrecut2009ParallelGI,
  title={Parallel GPU Implementation of Iterative PCA Algorithms},
  author={Mircea Andrecut},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  year={2009},
  volume={16 11},
  pages={
          1593-9
        }
}
  • M. Andrecut
  • Published 7 November 2008
  • Computer Science
  • Journal of computational biology : a journal of computational molecular cell biology
Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. [] Key Method Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the…

Figures from this paper

Modified fast PCA algorithm on GPU architecture
TLDR
The modified version of fast PCA (MFPCA) algorithm is presented on the GPU architecture and the suitability of the algorithm for face recognition task is discussed and Experimental results show a decrease of the MFPCA algorithm execution time while preserving the quality of the results.
Accelerating a Geometrical Approximated PCA Algorithm Using AVX2 and CUDA
TLDR
The experimental evaluation has shown not only the advantage of using CUDA programming in implementing the gaPCA algorithm on a GPU in terms of performance and energy consumption, but also significant benefits in implementing it on the multi-core CPU using AVX2 intrinsics.
Real-time PCA calculation for spectral imaging (using SIMD and GP-GPU)
TLDR
Two optimized implementations of the PCA algorithm are presented, primarily targeted on spectral image analysis in real time, and one utilizes the SSE instruction set of contemporary CPUs, and the other one runs on graphics processors, using the CUDA environment.
Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures
TLDR
This paper uses imaging spectrometer data to demonstrate the performance improvements attained by the implementation of PCA in GRASS GIS, which reduced runtime by nearly 99% using only multi-core related optimizations and an additional 50% reduction using GPU related optimizations.
A GPU parallel implementation of the Local Principal Component Analysis overcomplete method for DW image denoising
TLDR
This work designs and implements a parallel version of the OLPCA, by using a suitable mapping of the tasks on a GPU architecture with the aim of investigating the performance and the denoising features of the algorithm.
Performance Evaluation of Gradient-based Dimensionality Reduction Methods on Different Devices
  • A. BorisovE. Myasnikov
  • Computer Science
    2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT)
  • 2020
TLDR
This paper implemented the efficiency of the nonlinear mapping dimensionality reduction technique based on the stochastic and classic gradient descent algorithms using CUDA for NVIDIA GPU, HIP for AMD GPU, and OpenMP with AVX2 for CPU.
Implmentation of a covariance-based principal component analysis algorithm for hyperspectral imaging applications with multi-threading in both CPU and GPU
  • Jian ZhangKim Hwa Lim
  • Computer Science
    2012 IEEE International Geoscience and Remote Sensing Symposium
  • 2012
TLDR
An improvement which combines the multithreading in CPU, GPU and CUDA's graphics interoperability is presented and it is found that this combined framework approaches real-time processing much further.
Implementation of a covariance-based principal component analysis algorithm with a CUDA-enabled graphics processing unit
  • Jian ZhangKim Hwa Lim
  • Computer Science
    2011 IEEE International Geoscience and Remote Sensing Symposium
  • 2011
TLDR
It is found that the covariance-matrix approach has a great potential of reaching a real-time performance and compared the performance between them and their CPU counterparts.
Non-negative Matrix Factorization on GPU
TLDR
Computation of NMF on GPU using CUDA technology is introduced, which has main advantage in processing of non-negative matrix factorization which is easily interpretable as images, but other applications can be found in different areas as well.
...
...

References

SHOWING 1-3 OF 3 REFERENCES
Efficient Gram–Schmidt orthonormalisation on parallel computers
TLDR
The paper shows how these algorithms can be implemented on a parallel computer, and how their communication overhead can be minimized, and provides some guidelines for selecting the most appropriate algorithm.
Loss and Recapture of Orthogonality in the Modified Gram-Schmidt Algorithm
TLDR
The special structure of the product of the Householder transformations is derived, and then used to explain and bound the loss of orthogonality in MGS, which is illustrated by deriving a numerically stable algorithm based on MGS for a class of problems which includes solution of nonsingular linear systems.
Principal Component Analysis