Principal Component Analysis and Optimization: A Tutorial

  title={Principal Component Analysis and Optimization: A Tutorial},
  author={Robert Reris and J. Paul Brooks},
Principal component analysis (PCA) is one of the most widely used multivariate tech- niques in statistics. It is commonly used to reduce the dimensionality of data in order to examine its underlying structure and the covariance/correlation structure of a set of variables. While singular value decomposition provides a simple means for identi- cation of the principal components (PCs) for classical PCA, solutions achieved in this manner may not possess certain desirable properties including… 

Figures and Tables from this paper

Sparse kernel feature extraction via support vector learning

Robust and Sparse Kernel PCA and Its Outlier Map

A two-stage algorithm was proposed: a robust distance was computed to identify the uncontaminated data set, followed by estimating the best-fit ellipsoid to these data for an informative and concise representation, and a kernel PCA outlier map was proposed to display and classify the outliers.

Feature selection based on star coordinates plots associated with eigenvalue problems

A new feature relevance measure for star coordinates plots associated with the class of linear dimensionality reduction mappings defined through the solutions of eigenvalue problems, such as linear discriminant analysis or principal component analysis is proposed.

Estimating L 1-Norm Best-Fit Lines for Data

This paper presents a procedure to estimate the L1-norm best-fit onedimensional subspace (a line through the origin) to data in < based on an optimization criterion involving linear programming but which can be performed using simple ratios and sortings.

Characterizing L1-norm best-fit subspaces

The L1-norm best-fit subspace problem is directly formulated as a nonlinear, nonconvex, and nondifferentiable optimization problem that can be solved to global optimality efficiently by solving a series of linear programs.

Random selection of factors approximately preserves correlation structure in a linear factor model

A statistical factor model is developed, the random factor model, in which factors are chosen at random based on the random projection method, which enables derivation of probabilistic bounds for the accuracy of therandom factor representation of time-series, their cross-correlations and covariances.

Principal component analysis and singular value decomposition used for a numerical sensitivity analysis of a complex drawn part

The numerical forecasting of car body construction processes is already being used in industry to provide support in the ramp-up process. However, long calculation times are stretching the finite

Distributed Maximum Likelihood Principal Component Analysis for Wireless Sensor Network Data

A distributed maximum likelihood PCA algorithm is proposed that is more efficient in finding the principal components from the data containing anomalies and compares it with principal components computed across the network to identify the anomalies.

Random factor approach for large sets of equity time-series

The developed random factor model, in which factors are chosen at random based on the random projection method, is developed and derives probabilistic bounds for the accuracy of the random factor representation of time-series, their cross-correlations and covariances.

Automatic Baseline extraction based on PCA (Principal Component Analysis) method

An efficient algorithm for robust baseline extraction is proposed; in which the optimal weight vector is computed based on logic distribution function; and, the smooth parameters using PCA method; and the new algorithm has been extended to existing extraction methods.



K-means clustering via principal component analysis

It is proved that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, which indicates that unsupervised dimension reduction is closely related to unsuper supervised learning.

A Pure L1-norm Principal Component Analysis.

Tests show that L1-PCA* is the indicated procedure in the presence of unbalanced outlier contamination and the application of this idea that fits data to subspaces of successively smaller dimension is presented.

Principal Component Analysis

  • I. Jolliffe
  • Mathematics, Geology
    International Encyclopedia of Statistical Science
  • 1986
Introduction * Properties of Population Principal Components * Properties of Sample Principal Components * Interpreting Principal Components: Examples * Graphical Representation of Data Using

Robust Principal Component Analysis with Non-Greedy l1-Norm Maximization

Experimental results on real world datasets show that the nongreedy method always obtains much better solution than that of the greedy method, and then a robust principal component analysis with non-greedy l1-norm maximization is proposed.

A pure L1L1-norm principal component analysis

Robust principal component analysis?

It is proved that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, this suggests the possibility of a principled approach to robust principal component analysis.

A Generalized Least-Square Matrix Decomposition

By finding the best low-rank approximation of the data with respect to a transposable quadratic norm, the generalized least-square matrix decomposition (GMD), directly accounts for structural relationships and is demonstrated for dimension reduction, signal recovery, and feature selection with high-dimensional structured data.

Principal Component Analysis Based on L1-Norm Maximization

  • Nojun Kwak
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2008
A method of principal component analysis (PCA) based on a new L1-norm optimization technique which is robust to outliers and invariant to rotations and also proven to find a locally maximal solution.

Spectral Relaxation for K-means Clustering

It is shown that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by Computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition ofThe eigenvector matrix.

Principal Component Analysis

  • H. Shen
  • Environmental Science
    Encyclopedia of Database Systems
  • 2009
The Karhunen-Lo eve basis functions, more frequently referred to as principal components or empirical orthogonal functions (EOFs), of the noise response of the climate system are an important tool