Corpus ID: 220713890

Scalable Initialization Methods for Large-Scale Clustering

@article{Hmlinen2020ScalableIM,
  title={Scalable Initialization Methods for Large-Scale Clustering},
  author={Joonas H{\"a}m{\"a}l{\"a}inen and Tommi K{\"a}rkk{\"a}inen and Tuomo Rossi},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.11937}
}
  • Joonas Hämäläinen, Tommi Kärkkäinen, Tuomo Rossi
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also utilizes multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 41 REFERENCES
    A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm
    • 626
    • PDF
    Refining Initial Points for K-Means Clustering
    • 1,206
    • PDF
    k-means++: the advantages of careful seeding
    • 5,322
    • PDF
    Some methods for classification and analysis of multivariate observations
    • 21,228
    • PDF
    Scalable K-Means++
    • 440
    • Highly Influential
    • PDF
    Making k-means Even Faster
    • 142
    • PDF
    Randomized Dimensionality Reduction for $k$ -Means Clustering
    • 142
    • PDF
    Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach
    • 634
    • Highly Influential
    • PDF
    Using the Triangle Inequality to Accelerate k-Means
    • 686
    • PDF