Asymptotics for The k-means

@article{Zhang2022AsymptoticsFT,
  title={Asymptotics for The k-means},
  author={Tonglin Zhang},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.10015}
}
The k -means is one of the most important unsupervised learning techniques in statistics and computer science. The goal is to partition a data set into many clusters, such that observations within clusters are the most homogeneous and observations between clusters are the most heterogeneous. Although it is well known, the investigation of the asymptotic properties is far behind, leading to difficulties in developing more precise k -means methods in practice. To address this issue, a new concept… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 45 REFERENCES

A fast and recursive algorithm for clustering large datasets with k-medians

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.

Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States

Data clustering: 50 years beyond K-means

A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

Model-Based Clustering, Discriminant Analysis, and Density Estimation

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

Some methods for classification and analysis of multivariate observations

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give

Density-based Clustering

  • M. Ester
  • Computer Science, Business
    Encyclopedia of Database Systems
  • 2009
The clustering methods like K-means or Expectation-Maximization are suitable for finding ellipsoid-shaped clusters, but for non-convex clusters, these methods have trouble finding the true clusters, since two points from different clusters may be closer than two points in the same cluster.

Asymptotic properties of univariate sample k-means clusters

A random sample of sizeN is divided intok clusters that minimize the within clusters sum of squares locally. Some large sample properties of this k-means clustering method (ask approaches ∞ withN)

Local optima in K-means clustering: what you don't know may hurt you.

The results suggest the need for some strategy to study the local optima problem for a specific data set or to identify methods for finding "good" starting values that might lead to the best solutions possible.

K-means clustering: a half-century synthesis.

  • D. Steinley
  • Computer Science
    The British journal of mathematical and statistical psychology
  • 2006
This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.