• Corpus ID: 247158592

Strong Consistency for a Class of Adaptive Clustering Procedures

  title={Strong Consistency for a Class of Adaptive Clustering Procedures},
  author={Adam Quinn Jaffe},
. We introduce a class of clustering procedures which includes k means and k -medians, as well as variants of these where the domain of the cluster centers can be chosen adaptively (for example, k -medoids) and where the number of cluster centers can be chosen adaptively (for example, accord- ing to the elbow method). In the non-parametric setting and assuming only the finiteness of certain moments, we show that all clustering procedures in this class are strongly consistent under IID samples… 
1 Citations

Fr\'echet Mean Set Estimation in the Hausdorff Metric, via Relaxation

. This work resolves the following question in non-Euclidean statistics: Is it possible to consistently estimate the Fr´echet mean set of an unknown population distribution, with respect to the



A Sober Look at Clustering Stability

It is concluded that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters.

Stability of k -Means Clustering

This work establishes a complete characterization of clustering stability in terms of the number of optimal solutions to the underlying optimization problem for the data distribution, and challenges the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.

Towards a Statistical Theory of Clustering

This paper argues that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework and suggests that the main replacements of general- ization bounds should be convergence proofs and stability considerations.

Some methods for classification and analysis of multivariate observations

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give

Stability-Based Validation of Clustering Solutions

A measure of cluster stability is introduced to assess the validity of a cluster model and its suitability as a general validation tool for clustering solutions in real-world problems.

Data clustering: 50 years beyond K-means

A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

Estimating the number of clusters in a data set via the gap statistic

The gap statistic is proposed for estimating the number of clusters (groups) in a set of data by comparing the change in within‐cluster dispersion with that expected under an appropriate reference null distribution.

Approaches for measuring the stability of clustering methods

A classification of the various techniques appearing in the literature in terms of the approaches identified is provided and a number of generic approaches emerge.

The Elements of Statistical Learning

Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.

A notion of stability for k-means clustering

A new notion of stability for the $k$-means clustering scheme is defined and study building upon the notion of quantization of a probability measure, named absolute margin condition, inspired by recent works on the subject.