# Asymptotics for The k-means

@article{Zhang2022AsymptoticsFT, title={Asymptotics for The k-means}, author={Tonglin Zhang}, journal={ArXiv}, year={2022}, volume={abs/2211.10015} }

The k -means is one of the most important unsupervised learning techniques in statistics and computer science. The goal is to partition a data set into many clusters, such that observations within clusters are the most homogeneous and observations between clusters are the most heterogeneous. Although it is well known, the investigation of the asymptotic properties is far behind, leading to diﬃculties in developing more precise k -means methods in practice. To address this issue, a new concept…

## References

SHOWING 1-10 OF 45 REFERENCES

### A fast and recursive algorithm for clustering large datasets with k-medians

- Computer ScienceComput. Stat. Data Anal.
- 2012

### An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2007

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.

### Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States

- Computer ScienceComputational Statistics & Data Analysis
- 2021

### Data clustering: 50 years beyond K-means

- Computer SciencePattern Recognit. Lett.
- 2010

A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

### Model-Based Clustering, Discriminant Analysis, and Density Estimation

- Computer Science
- 2002

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

### Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give…

### Density-based Clustering

- Computer Science, BusinessEncyclopedia of Database Systems
- 2009

The clustering methods like K-means or Expectation-Maximization are suitable for finding ellipsoid-shaped clusters, but for non-convex clusters, these methods have trouble finding the true clusters, since two points from different clusters may be closer than two points in the same cluster.

### Asymptotic properties of univariate sample k-means clusters

- Mathematics
- 1984

A random sample of sizeN is divided intok clusters that minimize the within clusters sum of squares locally. Some large sample properties of this k-means clustering method (ask approaches ∞ withN)…

### Local optima in K-means clustering: what you don't know may hurt you.

- Computer SciencePsychological methods
- 2003

The results suggest the need for some strategy to study the local optima problem for a specific data set or to identify methods for finding "good" starting values that might lead to the best solutions possible.

### K-means clustering: a half-century synthesis.

- Computer ScienceThe British journal of mathematical and statistical psychology
- 2006

This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.