Validating clusters using the Hopkins statistic

@article{Banerjee2004ValidatingCU,
  title={Validating clusters using the Hopkins statistic},
  author={Amit Banerjee and Rajesh N. Dav{\'e}},
  journal={2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542)},
  year={2004},
  volume={1},
  pages={149-153 vol.1}
}
  • A. Banerjee, R. Davé
  • Published 2004
  • Mathematics, Computer Science
  • 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542)
A novel scheme for cluster validity using a test for random position hypothesis is proposed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by a partitioning algorithm. A test statistic such as the well-known Hopkins statistic could be used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set… Expand
A recursive clustering methodology using a genetic algorithm
  • A. Banerjee, S. Louis
  • Mathematics, Computer Science
  • 2007 IEEE Congress on Evolutionary Computation
  • 2007
TLDR
A recursive clustering scheme that uses a genetic algorithm-based search in a dichotomous partition space for an optimal dichotomy of the dataset and results compare favorably with state of the art approaches in genetic algorithms-driven clustering. Expand
A Hybrid Heuristic with Hopkins Statistic for the Automatic Clustering Problem
TLDR
The Silhouette Index was considered and a new proposed Hybrid Heuristic Algorithm (HHA) operates to identify the ideal number of groups, reflected in substantially lower computational time and in the solutions quality, that are competitive when compared with the best results reported in the literature. Expand
An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters
  • A. Banerjee
  • Mathematics
  • 2010 Annual Meeting of the North American Fuzzy Information Processing Society
  • 2010
In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition spaceExpand
The Fuzzy Mega-cluster: Robustifying FCM by Scaling Down Memberships
TLDR
A new robust clustering scheme based on fuzzy c-means, called the mega-clustering algorithm is shown to be robust against outliers, and its ability to distinguish between true outliers and non-outliers is interesting. Expand
A context-sensitive crossover operator for clustering applications
  • A. Banerjee, R. Davé
  • Mathematics, Computer Science
  • IEEE Congress on Evolutionary Computation
  • 2010
TLDR
A new context-sensitive crossover operator for genetic search based clustering applications that compares relevant sub-regions in partitions represented by the two parents selected for mating, passing on to the child only high fitness sub-Regions in the partition space. Expand
Giving Fuzziness to Spatial Clusters: a New Index for Choosing the Optimal Number of Clusters
TLDR
A new index for fuzzy clustering is introduced to determine the optimal number of clusters, which is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes. Expand
To Cluster, or Not to Cluster: An Analysis of Clusterability Methods
TLDR
An extensive comparison of measures of clusterability is performed and guidelines that clustering users can reference to select suitable measures for their applications are provided. Expand
A Comprehensive Comparison of Different Clustering Methods for Reliability Analysis of Microarray Data
TLDR
This study investigates the abilities of mixture decomposition schemes and proposes Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered in comparison with other methods in reliability analysis task. Expand
To Cluster, or Not to Cluster: How to Answer theestion
Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present.Expand
Using Cluster Ensembles to Identify Psychiatric Patient Subgroups
TLDR
This work applies cluster ensemble techniques to the problem of identifying subgroups of psychiatric patients, which have previously been shown to overcome drawbacks of individual clustering algorithms, and introduces a process guide for modelling and evaluating cluster ensembles in the form of a Meta Algorithmic Model. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
A test for multidimensional clustering tendency
TLDR
The Cox-Lewis statistic leads to one-sided tests for regularity having reasonable power and provides a sharper discrimination between random and clustered data than other statistics. Expand
Cluster validity for fuzzy clustering algorithms
Abstract The proportion exponent is introduced as a measure of the validity of the clustering obtained for a data set using a fuzzy clustering algorithm. It is assumed that the output of an algorithmExpand
Tests of randomness based on distance methods
The most familiar method of testing the hypothesis that an observed spatial distribution of points in the Euclidean plane is a realization of a Poisson point process, or in practical terminology thatExpand
A Validity Measure for Fuzzy Clustering
  • X. Xie, G. Beni
  • Mathematics, Computer Science
  • IEEE Trans. Pattern Anal. Mach. Intell.
  • 1991
The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent inExpand
Visual cluster validity (VCV) displays for prototype generator clustering methods
  • J. Bezdek, R. Hathaway
  • Mathematics, Computer Science
  • The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03.
  • 2003
TLDR
The proposed approach uses intensity images generated from the results of any prototype generator clustering algorithm as a means for cluster validation. Expand
A conditioned distance ratio method for analyzing spatial patterns
SUMMARY A new distance-based method is proposed for investigating the pattern in the plane formed by points, which may be assumed to be the positions of centres of trees in a forest stand. For eachExpand
Cluster Validity for the Fuzzy c-Means Clustering Algorithrm
  • M. P. Windham
  • Mathematics, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 1982
The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering producedExpand
Quadratic assignment as a general data analysis strategy.
The quadratic assignment paradigm developed in operations research is discussed as a general approach to data analysis tasks characterized by the use of proximity matrices. Data analysis problems areExpand
Validating fuzzy partitions obtained through c-shells clustering
  • R. Davé
  • Mathematics, Computer Science
  • Pattern Recognit. Lett.
  • 1996
TLDR
Validation of fuzzy partitions induced through c-shells clustering is considered, and a new set of indices are shown to be capable of validating the structure characterized by the shell clustering algorithms. Expand
A test for spatial pattern at several scales using data from a grid of contiguous quadrats.
TLDR
It is concluded that a set of tests, based on randomisation arguments, provides a fully valid method testing simultaneously for pattern at various scales. Expand
...
1
2
3
4
...