Learn More
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that P Π∼D (∀x ∈ W Πx 2 ∈ (1 ± ε)x 2) > 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 non-zero entries per column. This(More)
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R d , our algorithm achieves O c (n ρ + d log n) query time and O c (n 1+ρ + d log n) space, where ρ ≤ 7/(8c 2) + O(1/c 3) + o c (1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data(More)
In the turnstile model of data streams, an underlying vector <i>x</i> &#8712; {--<i>m</i>,--<i>m</i>+1,..., <i>m</i>--1,<i>m</i>}<sup><i>n</i></sup> is presented as a long sequence of positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function <i>f</i>(<i>x</i>) with constant probability while only making(More)
Sketching is a prominent algorithmic tool for processing large data. In this paper, we study the problem of sketching matrix norms. We consider two sketching models. The first is bilinear sketching, in which there is a distribution over pairs of r × n matrices S and n × s matrices T such that for any fixed n × n matrix A, from S · A · T one can approximate(More)
We consider the problem of approximate nearest neighbors in high dimensions, when the queries are lines. In this problem, given n points in R d , we want to construct a data structure to support efficiently the following queries: given a line L, report the point p closest to L. This problem generalizes the more familiar nearest neighbor problem. From a(More)
We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, ∆] d , with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains.(More)
An oblivious subspace embedding (OSE) for some ε, δ ∈ (0, 1/3) and d ≤ m ≤ n is a distribution D over R m×n such that for any linear subspace W ⊂ R n of dimension d, P Π∼D (∀x ∈ W, (1 − ε)x 2 ≤ Πx 2 ≤ (1 + ε)x 2) ≥ 1 − δ. We prove that any OSE with δ < 1/3 must have m = Ω((d + log(1/δ))/ε 2), which is optimal. Furthermore, if every Π in the support of D is(More)
We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean ~ ✓ of an unknown d dimensional gaussian distribution in the distributed setting. In this problem, the samples from the unknown distribution are distributed among m different machines. The goal is(More)
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is(More)