• Publications
  • Influence
A unified framework for approximating and clustering data
TLDR
A unified framework for constructing coresets and approximate clustering for general sets of functions over a ground set X. Expand
  • 275
  • 31
  • PDF
Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
TLDR
A (j, k)-coreset for projective clustering is a small set of points that yields a (1 + e)-approximation to the sum of squared distances from the rows of A to any set of k affine subspaces, each of dimension at most j. Expand
  • 346
  • 22
  • PDF
A PTAS for k-means clustering based on weak coresets
TLDR
We show that every unweighted point set P has a weak (ε, k)-coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is <i>independent</i> of the cardinality |P| of the point set and the dimension d of the Euclidean space R. Expand
  • 175
  • 12
  • PDF
New Frameworks for Offline and Streaming Coreset Constructions
TLDR
We construct coresets for the $k$-means clustering of $n$ input points, both in an arbitrary metric space and Euclidean space. Expand
  • 84
  • 10
  • PDF
Scalable Training of Mixture Models via Coresets
TLDR
We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set. Expand
  • 106
  • 7
  • PDF
Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
TLDR
We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network's output. Expand
  • 31
  • 5
  • PDF
Private coresets
TLDR
We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. Expand
  • 87
  • 4
  • PDF
Coresets forWeighted Facilities and Their Applications
TLDR
We develop efficient (1 + epsiv)-approximation algorithms for generalized facility location problems, and provide efficient algorithms for their construction. Expand
  • 44
  • 4
Training Gaussian Mixture Models at Scale via Coresets
TLDR
We show that Gaussian mixture models admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Expand
  • 33
  • 4
  • PDF
Dimensionality Reduction of Massive Sparse Datasets Using Coresets
TLDR
In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. Expand
  • 42
  • 3
  • PDF
...
1
2
3
4
5
...