Flexible Fault Tolerant Subspace Clustering for Data with Missing Values

Abstract

In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.

DOI: 10.1109/ICDM.2011.70

Extracted Key Phrases

7 Figures and Tables

Cite this paper

@article{Gnnemann2011FlexibleFT, title={Flexible Fault Tolerant Subspace Clustering for Data with Missing Values}, author={Stephan G{\"{u}nnemann and Emmanuel M{\"{u}ller and Sebastian Raubach and Thomas Seidl}, journal={2011 IEEE 11th International Conference on Data Mining}, year={2011}, pages={231-240} }