Recovering the number of clusters in data sets with noise features using feature rescaling factors

@article{Amorim2015RecoveringTN,
  title={Recovering the number of clusters in data sets with noise features using feature rescaling factors},
  author={Renato Cordeiro de Amorim and Christian Hennig},
  journal={Inf. Sci.},
  year={2015},
  volume={324},
  pages={126-145}
}
Penalized k-means algorithms for finding the correct number of clusters in a dataset
TLDR
This paper derives, for the case of ideal clusters, rigorous bounds for the coefficient of the additive penalty, and empirically investigates certain types of deviations from ideal cluster assumption and shows that combination of k-means with additive and multiplicative penalties can resolve ambiguous solutions.
Penalized K-Means Algorithms for Finding the Number of Clusters
TLDR
This paper derives rigorous bounds for the coefficient of the additive penalty in k-means for ideal clusters, which generally produces a more reliable signature, compared to additive penalty, for the correct number of clusters in cases where the ideal cluster assumption holds.
Unsupervised feature selection for large data sets
A Survey on Feature Weighting Based K-Means Algorithms
TLDR
This paper elaborates on the concept of feature weighting and addresses these issues by critically analyzing some of the most popular, or innovative, feature Weighting mechanisms based in K-Means.
A New Assessment of Cluster Tendency Ensemble approach for Data Clustering
TLDR
An improved SACT method for data clustering, called eSACT algorithm, which exhibited high performance, reliability and accuracy compared to previous proposed algorithms in the assessment of cluster tendency.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads
TLDR
An experimental setting is proposed for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix to evaluate the centroid recovery on par with conventional evaluation of the cluster recovery.
Experiments for the Number of Clusters in K-Means
TLDR
An adjusted iK-Means method is proposed, which performs well in the current experiment setting and is compared to the least squares and least modules version of an intelligent version of the method by Mirkin.
An examination of procedures for determining the number of clusters in a data set
TLDR
The aim of this paper is to compare three methods based on the hypervolume criterion with four other well-known methods for determining the number of clusters on artificial data sets.
K-means clustering: a half-century synthesis.
  • D. Steinley
  • Computer Science
    The British journal of mathematical and statistical psychology
  • 2006
TLDR
This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.
Some new indexes of cluster validity
TLDR
This work reviews two clustering algorithms and three indexes of crisp cluster validity and shows that while Dunn's original index has operational flaws, the concept it embodies provides a rich paradigm for validation of partitions that have cloud-like clusters.
Data Clustering: 50 Years Beyond K-means
The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding
On Initializations for the Minkowski Weighted K-Means
TLDR
It is found that the Ward method in the Minkowski space tends to outperform other initializations, with the exception of low-dimensional Gaussian Models with noise features.
On comparing partitions
Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,
Automated variable weighting in k-means type clustering
TLDR
A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed, and the convergency theorem of the new clustered process is given.
...
1
2
3
4
...