• Corpus ID: 51905620

Fusion Subspace Clustering: Full and Incomplete Data

@article{PimentelAlarcn2018FusionSC,
  title={Fusion Subspace Clustering: Full and Incomplete Data},
  author={Daniel L. Pimentel-Alarc{\'o}n and U. Mahmood},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.00628}
}
Modern inference and learning often hinge on identifying low-dimensional structures that approximate large scale data. Subspace clustering achieves this through a union of linear subspaces. However, in contemporary applications data is increasingly often incomplete, rendering standard (full-data) methods inapplicable. On the other hand, existing incomplete-data methods present major drawbacks, like lifting an already high-dimensional problem, or requiring a super polynomial number of samples… 

Figures from this paper

References

SHOWING 1-10 OF 46 REFERENCES
K-subspaces with missing data
TLDR
A fast algorithm is presented that combines GROUSE, an incremental matrix completion algorithm, and k-subspaces, the alternating minimization heuristic for solving the subspace clustering problem, and relies on a slightly more general projection theorem which is presented here.
High-Rank Matrix Completion and Clustering under Self-Expressive Models
TLDR
This work proposes efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces and shows that when the data matrix is low-rank, the algorithm performs on par with or better than low-Rank matrix completion methods, while for high-rank data matrices, the method significantly outperforms existing algorithms.
Robust Subspace Clustering via Thresholding Ridge Regression
TLDR
A new method of robust subspace clustering is presented, called Thresholding Ridge Regression (TRR), which calculates the ell2-norm-based coefficients of a given data set and performs a hard thresholding operator; and then the coefficients are used to build a similarity graph for clustering.
Group-sparse subspace clustering with missing data
TLDR
Two novel methods for subspace clustering with missing data are described: (a) group-sparse sub- space clustering (GSSC), which is based on group-sparsity and alternating minimization, and (b) mixture subspace clusters (MSC) which models each data point as a convex combination of its projections onto all subspaces in the union.
Data-Dependent Sparsity for Subspace Clustering
TLDR
This paper argues that a certain data-dependent, non-convex penalty function can compensate for dictionary structure in a way that is especially germane to subspace clustering problems and demonstrates a form of invariance to feature-space transformations and affine translations that commonly disrupt existing methods.
Robust Recovery of Subspace Structures by Low-Rank Representation
TLDR
It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.
Sparse Subspace Clustering: Algorithm, Theory, and Applications
TLDR
This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit
TLDR
This paper shows that the method based on orthogonal matching pursuit is both computationally efficient and guaranteed to give a subspace-preserving affinity under broad conditions and is the first one to handle 100,000 data points.
Sparse Subspace Clustering with Missing Entries
TLDR
Two new approaches for subspace clustering and completion are proposed and evaluated, which all outperform the natural approach when the data matrix is high-rank or the percentage of missing entries is large.
On the sample complexity of subspace clustering with missing data
TLDR
It is shown that if subspaces have rank at most r and the number of partially observed vectors greater than dr+1 (times a poly-logarithmic factor), then with high probability the true subsp spaces are the only subsp Spaces that agree with the observed data.
...
...