Scalable K-Means++

@article{Bahmani2012ScalableK,
  title={Scalable K-Means++},
  author={Bahman Bahmani and Benjamin Moseley and Andrea Vattani and Ravi Kumar and Sergei Vassilvitskii},
  journal={PVLDB},
  year={2012},
  volume={5},
  pages={622-633}
}
Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k-means++ is its inherent sequential nature, which limits its applicability to massive data… CONTINUE READING
Highly Influential
This paper has highly influenced 20 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 368 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 15 times. VIEW TWEETS

From This Paper

Topics from this paper.

Citations

Publications citing this paper.
Showing 1-10 of 168 extracted citations

Single-channel speech separation based on deep clustering with local optimization

2017 3rd International Conference on Frontiers of Signal Processing (ICFSP) • 2017
View 4 Excerpts
Highly Influenced

Parallelizing K-Means-Based Clustering on Spark

2016 International Conference on Advanced Cloud and Big Data (CBD) • 2016
View 6 Excerpts
Highly Influenced

368 Citations

050100'13'15'17'19
Citations per Year
Semantic Scholar estimates that this publication has 368 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 37 references

Hadoop: The Definitive Guide

View 5 Excerpts
Highly Influenced

Streaming k-means approximation

View 4 Excerpts
Highly Influenced

Efficient disk-based K-means clustering for relational databases

IEEE Transactions on Knowledge and Data Engineering • 2004
View 3 Excerpts
Highly Influenced

A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Large-Scale Parallel Data Mining • 1999
View 3 Excerpts
Highly Influenced

A survey on clustering algorithms for data in spatial database management systems

E. Chandra, V. P. Anuradha
International Journal of Computer Applications, • 2011
View 1 Excerpt