#### Filter Results:

#### Publication Year

2008

2011

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

We study a generalization of the <i>k</i>-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set <i>P</i> of size <i>n</i>, our goal is to find a set <i>C</i> of size <i>k</i> such that the sum of errors D(<i>P,C</i>) = ∑<sub><i>p</i> ∈ <i>P</i></sub> min<sub><i>c</i> ∈ <i>C</i></sub> {D(<i>p,c</i>)} is… (More)

We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++… (More)

We study the generalized k-median problem with respect to a Bregman divergence D φ. Given a finite set P ⊆ R d of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = p∈P min c∈C D φ (p, c) is minimized. The Bregman k-median problem plays an important role in many applications , e.g. information theory, statistics, text… (More)

We develop a new <it>k</it>-means clustering algorithm for data streams of points from a Euclidean space. We call this algorithm StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the <it>k</it>-means++ algorithm of Arthur and Vassilvitskii (SODA '07). To compute… (More)

The Bregman k-median problem is defined as follows. Given a Bregman divergence D φ and a finite set P ⊆ IR d of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = P p∈P minc∈C D φ (p, c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text… (More)

The diameter k-clustering problem is the problem of partitioning a finite subset of ℝ d into k subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of k) is the agglomerative clustering algorithm with the complete… (More)

We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the… (More)

- Jun Li, Arun Ayyagari, Craig F. Battles, Brian J. Smith, Stephen A. Uczekaj, Bhavna Ambudkar +24 others
- 2011

We develop a new k-means clustering algorithm for data streams which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [?]. To compute the small sample we use a variant of the k-means++ seeding procedure. We compare our algorithm experimentally with two… (More)