#### Filter Results:

#### Publication Year

2008

2011

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++… (More)

We study the generalized k-median problem with respect to a Bregman divergence D φ. Given a finite set P ⊆ R d of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = p∈P min c∈C D φ (p, c) is minimized. The Bregman k-median problem plays an important role in many applications , e.g. information theory, statistics, text… (More)

We study a generalization of the <i>k</i>-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set <i>P</i>, our goal is to find a set <i>C</i> of size <i>k</i> such that the sum of errors D(<i>P, C</i>) = Σ<i><sub>p∈P</sub></i> min<i><sub>c∈C</sub></i>{D(<i>p, c</i>)} is minimized. The main result in this… (More)

We study a generalization of the <i>k</i>-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set <i>P</i> of size <i>n</i>, our goal is to find a set <i>C</i> of size <i>k</i> such that the sum of errors D(<i>P,C</i>) = ∑<sub><i>p</i> ∈ <i>P</i></sub> min<sub><i>c</i> ∈ <i>C</i></sub> {D(<i>p,c</i>)} is… (More)

We develop a new <it>k</it>-means clustering algorithm for data streams of points from a Euclidean space. We call this algorithm StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the <it>k</it>-means++ algorithm of Arthur and Vassilvitskii (SODA '07). To compute… (More)

The Bregman k-median problem is defined as follows. Given a Bregman divergence D φ and a finite set P ⊆ IR d of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = P p∈P minc∈C D φ (p, c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text… (More)

The diameter k-clustering problem is the problem of partitioning a finite subset of ℝ d into k subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of k) is the agglomerative clustering algorithm with the complete… (More)

We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the… (More)

Digital storage demand is growing with the increasing use of digital artifacts from media files to business documents. Regulatory frameworks ask for unaltered, durable storage of business communications. In this paper we consider the problem of getting reliable evidence of the integrity and existence of some data from a storage service even if the data is… (More)

- ‹
- 1
- ›