Learn More
An oblivious subspace embedding (OSE) given some parameters &#x03B5;, d is a distribution D over matrices &#x03A0; &#x2208; R<sup>m&#x00D7;n</sup> such that for any linear subspace W &#x2286; R<sup>n</sup> with dim(W) = d, P<sub>&#x03A0;~D</sub>(&#x2200;x &#x2208; W ||&#x03A0;x||<sub>2</sub> &#x2208; (1 &#x00B1; &#x03B5;)||x||<sub>2</sub>) &gt; 2/3. We show(More)
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R d , our algorithm achieves O c (n ρ + d log n) query time and O c (n 1+ρ + d log n) space, where ρ ≤ 7/(8c 2) + O(1/c 3) + o c (1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data(More)
In the turnstile model of data streams, an underlying vector <i>x</i> &#8712; {--<i>m</i>,--<i>m</i>+1,..., <i>m</i>--1,<i>m</i>}<sup><i>n</i></sup> is presented as a long sequence of positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function <i>f</i>(<i>x</i>) with constant probability while only making(More)
Sketching is a prominent algorithmic tool for processing large data. In this paper, we study the problem of sketching matrix norms. We consider two sketching models. The first is bilinear sketching, in which there is a distribution over pairs of r × n matrices S and n × s matrices T such that for any fixed n × n matrix A, from S · A · T one can approximate(More)
We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean ~ ✓ of an unknown d dimensional gaussian distribution in the distributed setting. In this problem, the samples from the unknown distribution are distributed among m different machines. The goal is(More)
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal(More)
An oblivious subspace embedding (OSE) for some ε, δ ∈ (0, 1/3) and d ≤ m ≤ n is a distribution D over R m×n such that for any linear subspace W ⊂ R n of dimension d, P Π∼D (∀x ∈ W, (1 − ε)x 2 ≤ Πx 2 ≤ (1 + ε)x 2) ≥ 1 − δ. We prove that any OSE with δ < 1/3 must have m = Ω((d + log(1/δ))/ε 2), which is optimal. Furthermore, if every Π in the support of D is(More)
We consider the problem of approximate nearest neighbors in high dimensions, when the queries are lines. In this problem, given n points in R d , we want to construct a data structure to support efficiently the following queries: given a line L, report the point p closest to L. This problem generalizes the more familiar nearest neighbor problem. From a(More)
We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, ∆] d , with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains.(More)