• Corpus ID: 17161390

K-means Clustering Edo Liberty Algorithms in Data Mining

  title={K-means Clustering Edo Liberty Algorithms in Data Mining},
  • Published 2012
  • Mathematics
The sets Sj are the sets of points to which μj is the closest center. In each step of the algorithm the potential function is reduced. Let’s examine that. First, if the set of centers μj are fixed, the best assignment is clearly the one which assigns each data point to its closest center. Also, assume that μ is the center of a set of points S. Then, if we move μ to 1 |S| ∑ i∈S xi then we only reduce the potential. This is because 1 |S| ∑ i∈S xi is the best possible value for μ (can easily be… 

Figures from this paper



Spectral Relaxation for K-means Clustering

It is shown that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by Computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition ofThe eigenvector matrix.

Smaller Coresets for k-Median and k-Means Clustering

In this paper we show that there exists a $(k,\varepsilon)$-coreset for k-median and k-means clustering of n points in ${\cal R}^d,$ which is of size independent of n. In particular, we construct a

Streaming k-means approximation

A clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting, which is applicable to unsupervised learning on massive data sets, or resource-constrained devices.

k-means++: the advantages of careful seeding

By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

K-means clustering via principal component analysis

It is proved that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, which indicates that unsupervised dimension reduction is closely related to unsuper supervised learning.

Clustering Data Streams: Theory and Practice

This work describes a streaming algorithm that effectively clusters large data streams and provides empirical evidence of the algorithm's performance on synthetic and real data streams.

Least squares quantization in PCM

The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.

Ding and Xiaofeng He. K-means clustering via principal component analysis

  • ICML
  • 2004

Lloyd . Least squares quantization in pcm

  • IEEE Transactions on Information Theory
  • 1982