An extension of the PMML standard to subspace clustering models

  title={An extension of the PMML standard to subspace clustering models},
  author={Stephan G{\"u}nnemann and H. Kremer and T. Seidl},
  booktitle={PMML '11},
In today's applications we face the challenge of analyzing databases with many attributes per object. For these high dimensional data it is known that traditional clustering algorithms fail to detect meaningful patterns: mining the full-space is futile. As a solution subspace clustering techniques were introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. Recently, public available mining software integrates subspace clustering as a novel mining… Expand
5 Citations
Subspace clustering for complex data
This work introduces novel methods for effective subspace clustering on various types of complex data: vector data, imperfect data, and graph data and proposes models whose solutions contain only non-redundant and, thus, valuable clusters. Expand
Data Mining: Various Issues and Challenges for Future A Short discussion on Data Mining issues for future work
Vision of the future work to be done in area of data mining is sketched, which included various future challenges and issues in data mining which is important to do further more research in this emerging field. Expand
Integrating Rule-Based Systems and Data Analytics Tools Using Open Standard PMML
This paper investigates the open standard PMML (Predictive Model Mockup Language) in integrating rule-based expert systems with data analytics tools, so that a decision maker would have access to powerful tools in dealing with both reasoning-intensive tasks and data- intensive tasks. Expand
Rising Expenses on Data Mining
Data mining is the new term relative to the technique used to inquire through the volumes of supermarket scanner data and to analyze the research reports of market. Expand
Mining and similarity search in temporal databases
Zusammenfassung 1 1 Overview 5 1.


A generic framework for efficient subspace clustering of high-dimensional data
A generic framework to overcome limitations in subspace clustering methods, based on an efficient filter-refinement architecture that scales at most quadratic w.r.t. the data dimensionality and the dimensionality of the subspace clusters. Expand
Density-Connected Subspace Clustering for High-Dimensional Data
SUBCLU (density-connected Subspace Clustering), an effective and efficient approach to the subspace clustering problem, based on a formal clustering notion using the concept of density-connectivity underlying the algorithm DBSCAN [EKSX96]. Expand
Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data
This work proposes a novel model for relevant subspace clustering (RESCU), and presents a global optimization which detects the most interesting non-redundant subspace clusters and proves that computation of this model is NP-hard. Expand
Ranking Interesting Subspaces for Clustering High Dimensional Data
This work defines a quality criterion for the interestingness of a subspace and proposes an efficient algorithm called RIS (Ranking Interesting Subspaces) to examine all such subspaces in large, high dimensional, sparse data and to rank them accordingly. Expand
Entropy-based subspace clustering for mining numerical data
This work considers a database with numerical attributes, in which each transaction is viewed as a multi-dimensional vector, and identifies new meaningful criteria of high density and correlation of dimensions for goodness of clustering in subspaces. Expand
Fast algorithms for projected clustering
An algorithmic framework for solving the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves, is developed and tested. Expand
P3C: A Robust Projected Clustering Algorithm
This paper presents a robust algorithm that significantly outperforms existing algorithms for projected clustering in terms of accuracy, is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces, and is scalable with respect to large data sets and high number of dimensions. Expand
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering
This work proposes a novel problem formulation that aims at extracting axis-parallel regions that stand out in the data in a statistical sense and proposes the approximation algorithm STATPC, which significantly outperforms existing projected and subspace clustering algorithms in terms of accuracy. Expand
Subspace clustering for high dimensional data: a review
A survey of the various subspace clustering algorithms along with a hierarchy organizing the algorithms by their defining characteristics is presented, comparing the two main approaches using empirical scalability and accuracy tests and discussing some potential applications where sub space clustering could be particularly useful. Expand
Iterative projected clustering by subspace mining
This work proposes a technique that improves the efficiency of a projected clustering algorithm (DOC), an optimized adaptation of the frequent pattern tree growth method used for mining frequent itemsets that significantly improves on the accuracy and speed of previous techniques. Expand