A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies

  title={A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies},
  author={Ricardo J. G. B. Campello and Davoud Moulavi and Arthur Zimek and J{\"o}rg Sander},
  journal={Data Mining and Knowledge Discovery},
We introduce a framework for the optimal extraction of flat clusterings from local cuts through cluster hierarchies. The extraction of a flat clustering from a cluster tree is formulated as an optimization problem and a linear complexity algorithm is presented that provides the globally optimal solution to this problem in semi-supervised as well as in unsupervised scenarios. A collection of experiments is presented involving clustering hierarchies of different natures, a variety of real data… 

A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

An adaptation of a variant of the Modularity Q measure is proposed so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies, and the results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.

Model Selection for Semi-Supervised Clustering

This work provides a method for model selection in semi-supervised clustering based on this sound evaluation procedure and allows the user to select, based on the available information, the most appropriate clustering model for a given problem.

A unified framework of density-based clustering for semi-supervised classification

This paper introduces a unified framework for semi-supervised classification based on building-blocks from density-based clustering that is not only efficient and effective, but it is also statistically sound.

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

COBRA is an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints by maximally exploiting constraint transitivity and entailment.

A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering

This work introduces a method to score each must-link and cannot-link pairwise constraint as likely incorrect and shows that the resulting impact score can successfully identify individual constraints that should be removed or revised.

Constraint-based clustering selection

This paper investigates a complementary way of using the constraints: they are used to select an unsupervised clustering method and tune its hyperparameters, and it turns out that this very simple approach outperforms all existing semi-supervised methods.

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced, consisting of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees.

A unified view of density-based methods for semi-supervised clustering and classification

This paper shows that there are close relations between density-based clustering algorithms and the graph-based approach for transductive classification and builds upon this view to bridge the areas of semi-supervised clustering and classification under a common umbrella ofdensity-based techniques.

Limitations of Using Constraint Set Utility in Semi-Supervised Clustering

This consistency-based approach to clustering approach is found to be unsuccessful, and is explained by observing that the previously found correlation between utility measures and clustering performance is only present when the authors look at results of different data sets jointly.

Semi-supervised DenPeak Clustering with Pairwise Constraints

A novel semi-supervised DenPeak clustering (SSDC) method is introduced by introducing pairwise constraints or side information to guide the cluster process by improving the clustering performance.



On the Effects of Constraints in Semi-supervised Hierarchical Clustering

In this task, the main interest lies in building stable dendrograms when clustering with different subsets of data and the use of constraints with divisive hierarchical clustering.

Semi-supervised Hierarchical Clustering

  • Li ZhengTao Li
  • Computer Science
    2011 IEEE 11th International Conference on Data Mining
  • 2011
A novel semi-supervised hierarchical clustering framework based on ultra-metric dendrogram distance is proposed that is able to incorporate triple-wise relative constraints and two techniques (the constrained optimization technique and the transitive dissimilarity based technique) are proposed.

Interactive Interpretation of Hierarchical Clustering

Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints

This paper considers ag-glomerative hierarchical algorithms with pairwise constraints and introduces the single linkage which is equivalent to the transitive closure algorithm, while the centroid method and the Ward methods need kernelization of the algorithms.

Automatic Extraction of Clusters from Hierarchical Clustering Representations

This paper investigates the relation between dendrograms and reachability plots and introduces methods to convert them into each other showing that they essentially contain the same information, and introduces a technique that automatically determines the significant clusters in a hierarchical cluster representation.

An effective document clustering method using user-adaptable distance metrics

A way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy are proposed.

User Oriented Hierarchical Information Organization and Retrieval

This article proposes an approach that is able to derive a personalized hierarchical structure from a document collection based on a semi-supervised hierarchical clustering approach, which is combined with a biased cluster extraction process.

HISSCLU: a hierarchical density-based method for semi-supervised clustering

HISSCLU is proposed, a hierarchical, density-based method for semi-supervised clustering that allows a better preservation of the actual cluster structure, particularly if the data set contains several distinct clusters of the same class.

Hierarchical Clustering Algorithms for Document Datasets

The experimental evaluation shows that, contrary to the common belief, partitional algorithms always lead to better solutions than agglomerative algorithms; making them ideal for clustering large document collections due to not only their relatively low computational requirements, but also higher clustering quality.

Personalized Hierarchical Clustering

  • Korinna BadeA. Nürnberger
  • Computer Science
    2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
  • 2006
This paper investigates in this paper how to obtain a hierarchical structure automatically, taking into account some background knowledge about the way a specific user would structure the collection, and adapt a hierarchical agglomerative clustering algorithm to take into account user specific constraints on the clustering process.