Learn More
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that P Π∼D (∀x ∈ W Πx 2 ∈ (1 ± ε)x 2) > 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 non-zero entries per column. This(More)
In the turnstile model of data streams, an underlying vector <i>x</i> &#8712; {--<i>m</i>,--<i>m</i>+1,..., <i>m</i>--1,<i>m</i>}<sup><i>n</i></sup> is presented as a long sequence of positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function <i>f</i>(<i>x</i>) with constant probability while only making(More)
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R d , our algorithm achieves O c (n ρ + d log n) query time and O c (n 1+ρ + d log n) space, where ρ ≤ 7/(8c 2) + O(1/c 3) + o c (1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data(More)
Sketching is a prominent algorithmic tool for processing large data. In this paper, we study the problem of sketching matrix norms. We consider two sketching models. The first is bilinear sketching, in which there is a distribution over pairs of r × n matrices S and n × s matrices T such that for any fixed n × n matrix A, from S · A · T one can approximate(More)
We consider the problem of approximate nearest neighbors in high dimensions, when the queries are lines. In this problem, given n points in R d , we want to construct a data structure to support efficiently the following queries: given a line L, report the point p closest to L. This problem generalizes the more familiar nearest neighbor problem. From a(More)
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal(More)
An oblivious subspace embedding (OSE) for some ε, δ ∈ (0, 1/3) and d ≤ m ≤ n is a distribution D over R m×n such that for any linear subspace W ⊂ R n of dimension d, P Π∼D (∀x ∈ W, (1 − ε)x 2 ≤ Πx 2 ≤ (1 + ε)x 2) ≥ 1 − δ. We prove that any OSE with δ < 1/3 must have m = Ω((d + log(1/δ))/ε 2), which is optimal. Furthermore, if every Π in the support of D is(More)
Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while(More)
We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, ∆] d , with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains.(More)