• Publications
  • Influence
A unified framework for approximating and clustering data
A unified framework for constructing coresets and approximate clustering for general sets of functions, and shows how to generalize the results of the framework for squared distances, distances to the qth power, and deterministic constructions. Expand
Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
The authors' coresets with the merge-and-reduce approach obtain embarrassingly parallel streaming algorithms for problems such as k-means, PCA and projective clustering, and a simple recursive coreset construction that produces coresets of size. Expand
A PTAS for k-means clustering based on weak coresets
Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space R<sup>d</sup>. Expand
New Frameworks for Offline and Streaming Coreset Constructions
This work introduces a new technique for converting an offline coreset construction to the streaming setting, and provides the first generalizations of such coresets for handling outliers. Expand
Scalable Training of Mixture Models via Coresets
It is proved that a weighted set of O(dk3/e2) data points suffices for computing a (1 + e)-approximation for the optimal model on the original n data points, which guarantees that models fitting the coreset will also provide a good fit for the original data set. Expand
Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates theExpand
Private coresets
A link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy are forged, and it is proved that private coresets must have an additive error. Expand
Training Gaussian Mixture Models at Scale via Coresets
This work shows that Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size, which means that one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. Expand
Coresets forWeighted Facilities and Their Applications
This work develops efficient (1 + epsiv)-approximation algorithms for generalized facility location problems, and introduces coresets for weighted (point) facilities that prove to be useful for such generalized facility locations problems. Expand
Dimensionality Reduction of Massive Sparse Datasets Using Coresets
A new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream is presented, using coresets to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix. Expand