• Corpus ID: 88515338

Subspace Clustering with Missing and Corrupted Data

@article{Charles2017SubspaceCW,
  title={Subspace Clustering with Missing and Corrupted Data},
  author={Zachary B. Charles and Amin Jalali and Rebecca M. Willett},
  journal={arXiv: Machine Learning},
  year={2017}
}
Given full or partial information about a collection of points that lie close to a union of several subspaces, subspace clustering refers to the process of clustering the points according to their subspace and identifying the subspaces. One popular approach, sparse subspace clustering (SSC), represents each sample as a weighted combination of the other samples, with weights of minimal $\ell_1$ norm, and then uses those learned weights to cluster the samples. SSC is stable in settings where each… 

Figures from this paper

SPARSE SUBSPACE CLUSTERING WITH MISSING AND CORRUPTED DATA
TLDR
This paper studies a robust variant of sparse subspace clustering (SSC) and gives explicit bounds on the amount of additive noise and the number of missing entries the algorithm can tolerate, both in deterministic settings and in a random generative model.
Theoretical Analysis of Sparse Subspace Clustering with Missing Entries
TLDR
This paper analytically establishes that projecting the zero-filled data onto the observation pattern of the point being expressed leads to a substantial improvement in performance, and gives theoretical guarantees for SSC with incomplete data.
Evolutionary Self-Expressive Models for Subspace Clustering
TLDR
This work introduces evolutionary subspace clustering, a method whose objective is to cluster a collection of evolving data points that lie on a union of low-dimensional evolving subspaces, and proposes a non-convex optimization framework that exploits the self-expressiveness property of the evolving data while taking into account representation from the preceding time step.
Low-Rank Approximation of Matrices Via A Rank-Revealing Factorization with Randomization
  • M. Kaloorazi, J. Chen
  • Computer Science, Mathematics
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
This paper presents an algorithm called randomized pivoted TSOD (RP-TSOD) that constructs a highly accurate approximation to the TSOD decomposition through the exploitation of randomization and furnishes upper bounds on the error of the low-rank approximation and bounds for the canonical angles between the approximate and the exact singular subspaces.
Optimal Recovery of Missing Values for Non-Negative Matrix Factorization
Missing values imputation is often evaluated on some similarity measure between actual and imputed data. However, it may be more meaningful to evaluate downstream algorithm performance after
Optimal Recovery of Missing Values for Non-negative Matrix Factorization
TLDR
Under certain geometric conditions, tight upper bounds on NMF relative error are proved, which is the first bound of this type for missing values for non-negative matrix factorization (NMF).
Tensor Methods for Nonlinear Matrix Completion
TLDR
A LADMC algorithm that leverages existing LRMC methods on a tensorized representation of the data and outperforms existing state-of-the-art methods for matrix completion under a union of subspaces model is proposed.
Efficient Low-Rank Approximation of Matrices Based on Randomized Pivoted Decomposition
TLDR
An algorithm called randomized pivoted TSOD (RP-TSOD) is presented, where the middle factor is lower triangular, and bounds for the canonical angles between the approximate and the exact singular subspaces are derived.

References

SHOWING 1-10 OF 33 REFERENCES
Group-sparse subspace clustering with missing data
TLDR
Two novel methods for subspace clustering with missing data are described: (a) group-sparse sub- space clustering (GSSC), which is based on group-sparsity and alternating minimization, and (b) mixture subspace clusters (MSC) which models each data point as a convex combination of its projections onto all subspaces in the union.
Theoretical Analysis of Sparse Subspace Clustering with Missing Entries
TLDR
This paper analytically establishes that projecting the zero-filled data onto the observation pattern of the point being expressed leads to a substantial improvement in performance, and gives theoretical guarantees for SSC with incomplete data.
A Geometric Analysis of Subspace Clustering with Outliers
TLDR
A novel geometric analysis of an algorithm named sparse subspace clustering (SSC) is developed, which signicantly broadens the range of problems where it is provably eective and shows that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension.
Sparse Subspace Clustering with Missing Entries
TLDR
Two new approaches for subspace clustering and completion are proposed and evaluated, which all outperform the natural approach when the data matrix is high-rank or the percentage of missing entries is large.
Noisy Sparse Subspace Clustering
TLDR
It is shown that a modified version of SSC is provably effective in correctly identifying the underlying subspaces, even with noisy data, which extends theoretical guarantee of this algorithm to the practical setting and provides justification to the success of SCC in a class of real applications.
Sparse subspace clustering
TLDR
This work proposes a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space and applies this method to the problem of segmenting multiple motions in video.
Robust Subspace Clustering
TLDR
This paper introduces an algorithm inspired by sparse subspace clustering (SSC) to cluster noisy data, and develops some novel theory demonstrating its correctness.
The Information-Theoretic Requirements of Subspace Clustering with Missing Data
TLDR
To derive deterministic sampling conditions for SCMD, which give precise information-theoretic requirements and determine sampling regimes, a practical algorithm is given to certify the output of any SCMD method deterministically.
High-Rank Matrix Completion and Clustering under Self-Expressive Models
TLDR
This work proposes efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces and shows that when the data matrix is low-rank, the algorithm performs on par with or better than low-Rank matrix completion methods, while for high-rank data matrices, the method significantly outperforms existing algorithms.
Generalized principal component analysis (GPCA)
  • R. Vidal, Yi Ma, S. Sastry
  • Mathematics, Medicine
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2005
TLDR
An algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points and applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views are presented.
...
1
2
3
4
...