Why so many clustering algorithms: a position paper

@article{EstivillCastro2002WhySM,
  title={Why so many clustering algorithms: a position paper},
  author={Vladimir Estivill-Castro},
  journal={SIGKDD Explor.},
  year={2002},
  volume={4},
  pages={65-75}
}
We argue that there are many clustering algorithms, because the notion of "cluster" cannot be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have proposed many induction principles and models whose corresponding optimization problem can only be approximately solved by an even larger number of algorithms. Therefore, comparing clustering algorithms, must take into account a careful understanding of the inductive principles involved. 
Empirical Analysis of Data Clustering Algorithms
TLDR
Different clustering approaches are studied from the theoretical perspective to understand their relevance in context of massive data-sets and empirically these have been tested on artificial benchmarks to highlight their strengths and weaknesses. Expand
Approximation Algorithms for Clustering
Agglomerative hierarchical clustering is an important clustering algorithm which has many real life applications such as customer segmentation. Its biggest drawback is its large time complexity ofExpand
Introduction to partitioning-based clustering methods with a robust example
TLDR
A new robust partitioning-based method is presented and a review on iterative relocation clustering algorithms, and some illustrative results are presented. Expand
Sequentially Grouping Items into Clusters of Unspecified Number
TLDR
It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items considered as outliers can be removed. Expand
Number 4
Till date, different papers are available on survey of clustering algorithms. The novel approach used in this paper is use of Mind Maps to present key details about clustering algorithms in visualExpand
A brief study on clustering methods: Based on the k-means algorithm
TLDR
A process model for data mining and the typical requirements of clustering methods have been described and the k-means algorithm and its advantages and disadvantages are introduced. Expand
Common Clustering Algorithms
TLDR
This chapter surveys common clustering algorithms widely used in the data mining community in light of chemometrics, and overviews hybrid clustering approaches combining partitioning clustering and hierarchical clustering. Expand
An optimization approach to partitional data clustering
TLDR
Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality, and these results are more persuasive as the size of the problem is larger. Expand
Cluster Validity Using Support Vector Machines
TLDR
A method to compare clustering results from different algorithms or different runs of the same algorithm, but it can also filter noise and outliers so that for a fixed data set the authors can identify what is the most robust and potentially meaningful clustering result. Expand
A mathematical model of similarity and clustering
  • F. Sun, C. Tzeng
  • Computer Science
  • International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004.
  • 2004
TLDR
An abstract model of data similarity and clustering is introduced, and a heuristic method to search for sub-optimal clusterings for a given tolerance relation is proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
On Some Clustering Techniques
  • R. Bonner
  • Computer Science
  • IBM J. Res. Dev.
  • 1964
TLDR
A number of methods which make use of IBM 7090 computer programs to do clustering are described, and a medical research problem is used to illustrate and compare these methods. Expand
Non-crisp Clustering by Fast, Convergent, and Robust Algorithms
TLDR
These algorithms are robust because they use medians rather than means as estimators of location, and the resulting representative of a cluster is actually a data item, and it is demonstrated mathematically that they converge. Expand
Data clustering: a review
TLDR
An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. Expand
Efficient and Effective Clustering Methods for Spatial Data Mining
TLDR
The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms. Expand
A human-computer cooperative system for effective high dimensional clustering
TLDR
A system which performs high dimensional clustering by effective cooperation between the human and the computer in order to create very meaningful sets of clusters in high dimensionality is proposed. Expand
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it. Expand
Chameleon: Hierarchical Clustering Using Dynamic Modeling
TLDR
Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters. Expand
CURE: an efficient clustering algorithm for large databases
TLDR
This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality. Expand
Quality Scheme Assessment in the Clustering Process
TLDR
This paper presents an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set, and selects the best clustering scheme according to a quality index. Expand
BIRCH: an efficient data clustering method for very large databases
TLDR
A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases. Expand
...
1
2
3
4
5
...