A Geometric Analysis of Subspace Clustering with Outliers

  title={A Geometric Analysis of Subspace Clustering with Outliers},
  author={Mahdi Soltanolkotabi and Emmanuel J. Cand{\`e}s},
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [11], which signicantly broadens the range of problems where it is provably eective… 
Robust Subspace Clustering via Thresholding
A simple low-complexity subspace clustering algorithm is proposed, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points, and the results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level.
Greedy Subspace Clustering
The statistical analysis shows that the algorithms are guaranteed exact (perfect) clustering performance under certain conditions on the number of points and the affinity between subspaces, which are weaker than those considered in the standard statistical literature.
Subspace Clustering with Missing and Corrupted Data
This paper studies a robust variant of SSC and establishes clustering guarantees in the presence of corrupted or missing data, and gives explicit bounds on amount of noise and missing data that the algorithm can tolerate, both in deterministic settings and in a random generative model.
Data-Dependent Sparsity for Subspace Clustering
This paper argues that a certain data-dependent, non-convex penalty function can compensate for dictionary structure in a way that is especially germane to subspace clustering problems and demonstrates a form of invariance to feature-space transformations and affine translations that commonly disrupt existing methods.
On Geometric Analysis of Affine Sparse Subspace Clustering
A novel geometric analysis is developed for a variant of SSC, named affine SSC (ASSC), for the problem of clustering data from a union of affine subspaces, and it is shown that subspace-preserving recovery can be achieved under much weaker conditions.
Subspace Clustering with a Twist
This work derives a probabilistic model that simultaneously estimates the latent data points and subspace memberships using simple EM update rules and in certain restricted settings this approach is guaranteed to produce the correct clustering.
Theoretical Analysis of Sparse Subspace Clustering with Missing Entries
This paper analytically establishes that projecting the zero-filled data onto the observation pattern of the point being expressed leads to a substantial improvement in performance, and gives theoretical guarantees for SSC with incomplete data.
Graph Connectivity in Noisy Sparse Subspace Clustering
These results provide the first exact clustering guarantee of noisy SSC for subspaces of dimension greater then 3 and show that a simple post-processing procedure is capable of delivering consistent clustering under certain "general position" or "restricted eigenvalue" assumptions.
Leveraging Union of Subspace Structure to Improve Constrained Clustering
This work presents a pairwise-constrained clustering algorithm that actively selects queries based on the union-of-subspaces model, and proves that points lying near the intersection of subspaces are points with low margin.
Sparse and Low-Rank Methods
Whether a subspace clustering affinity that utilizes global geometric relationships among all the data points, is computationally tractable when the dimension and number of subspaces are large, and is guaranteed to provide the correct clustering under certain conditions is questioned.


Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation
It is proved that under mild technical conditions, any solution to LRR exactly recovers the row space of the samples and detect the outliers as well, which implies that LRR can perform exact subspace segmentation and outlier detection, in an efficient way.
A closed form solution to robust subspace estimation and clustering
This work uses an augmented Lagrangian optimization framework, which requires a combination of the proposed polynomial thresholding operator with the more traditional shrinkage-thresholding operator, to solve the problem of fitting one or more subspace to a collection of data points drawn from the subspaces and corrupted by noise/outliers.
Clustering disjoint subspaces via sparse representation
  • Ehsan Elhamifar, R. Vidal
  • Computer Science
    2010 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2010
Given a set of data points drawn from multiple low-dimensional linear subspaces of a high-dimensional space, this work derives theoretical bounds relating the principal angles between the subspaced and the distribution of the data points across all theSubspaces under which the coefficients are guaranteed to be sparse.
Generalized principal component analysis (GPCA)
  • R. Vidal, Yi Ma, S. Sastry
  • Computer Science, Mathematics
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2005
An algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points and applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views are presented.
Combined central and subspace clustering for computer vision applications
This paper proposes a generalization of Kmeans and Ksubspaces that clusters the data by minimizing a cost function that combines both central and subspace distances.
Sparse subspace clustering
This work proposes a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space and applies this method to the problem of segmenting multiple motions in video.
A number of approaches to subspace clustering have been proposed in the machine learning and computer vision communities, including algebraic methods, iterative methods, statistical methods, and spectral clustering-based methods are presented.
k-means projective clustering
An extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces is presented, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error.
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.
Robust recovery of multiple subspaces by geometric l_p minimization
We assume i.i.d. data sampled from a mixture distribution with K components along fixed d-dimensional linear subspaces and an additional outlier component. For p>0, we study the simultaneous recovery