Learn More
We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a <i>clusterability</i> criterion for data sets. We present variants of Lloyd's heuristic that(More)
We prove that if a linear error-correcting code C : f0;1g n ! f0;1g m is such that a bit of the message can be probabilistically reconstructed by looking at two entries of a corrupted codeword, then m = 2 (n). We also present several extensions of this result. We show a reduction from the complexity of one-round, information-theoretic Private Information(More)
We present a fairly general method for nding deterministic constructions obeying what we call k-restrictions; this yields structures of size not much larger than the probabilistic bound. The structures constructed by our method include (n; k)-universal sets (a collection of binary vectors of length n such that for any subset of size k of the indices, all 2(More)
In this paper we obtain improved upper and lower bounds for the best approximation factor for Sparsest Cut achievable in the cut-matching game framework proposed in Khandekar et al. [9]. We show that this simple framework can be used to design combinatorial algorithms that achieve O(log n) approximation factor and whose running time is dominated by a(More)
We consider the following <i>taxonomy labeling problem.</i> Each node of an <i>n</i>-node tree has to be labeled with the values of <i>k</i> attributes. A partial labeling is given as part of the input. The goal is to complete this labeling, minimizing the maximum variation in labeling along an edge. A special case of this problem (which we call the(More)
Sampling is an important primitive in probabilistic and quantum algorithms. In the spirit of communication complexity, given a function $f: X \times Y \rightarrow \{0,1\}$ and a probability distribution $D$ over $X \times Y$, we define the sampling complexity of $(f,D)$ as the minimum number of bits Alice and Bob must communicate for Alice to pick $x \in X$(More)