Marcel R. Ackermann

Learn More
We study a generalization of the <i>k</i>-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set <i>P</i> of size <i>n</i>, our goal is to find a set <i>C</i> of size <i>k</i> such that the sum of errors D(<i>P,C</i>) &equals; &sum;<sub><i>p</i> &in; <i>P</i></sub> min<sub><i>c</i> &in; <i>C</i></sub> {D(<i>p,c</i>)} is(More)
We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++(More)
We study the generalized k-median problem with respect to a Bregman divergence D φ. Given a finite set P ⊆ R d of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = p∈P min c∈C D φ (p, c) is minimized. The Bregman k-median problem plays an important role in many applications , e.g. information theory, statistics, text(More)
We develop a new &lt;it&gt;k&lt;/it&gt;-means clustering algorithm for data streams of points from a Euclidean space. We call this algorithm StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the &lt;it&gt;k&lt;/it&gt;-means++ algorithm of Arthur and Vassilvitskii (SODA '07). To compute(More)
The diameter k-clustering problem is the problem of partitioning a finite subset of ℝ d into k subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of k) is the agglomerative clustering algorithm with the complete(More)
We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the(More)
We develop a new k-means clustering algorithm for data streams which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [?]. To compute the small sample we use a variant of the k-means++ seeding procedure. We compare our algorithm experimentally with two(More)