• Publications
  • Influence
Learning mixtures of Gaussians
  • S. Dasgupta
  • Mathematics, Computer Science
  • 40th Annual Symposium on Foundations of Computer…
  • 17 October 1999
TLDR
This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability. Expand
A Generalization of Principal Components Analysis to the Exponential Family
TLDR
This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances to give a generalization of PCA to loss functions that it is argued are better suited to other data types. Expand
An elementary proof of a theorem of Johnson and Lindenstrauss
A result of Johnson and Lindenstrauss [13] shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidean space such that the distanceExpand
Analysis of a greedy active learning strategy
  • S. Dasgupta
  • Computer Science, Mathematics
  • NIPS
  • 1 December 2004
TLDR
The core search problem of active learning schemes is abstract out, and it is proved that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels. Expand
Random projection trees and low dimensional manifolds
We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data without having to explicitly learn this structure.
An elementary proof of the Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss lemma shows that a set of n points in high dimensional Euclidean space can be mapped down into an O(log n== 2) dimensional Euclidean space such that the distance between anyExpand
Importance weighted active learning
TLDR
This work presents a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions that uses importance weighting to correct sampling bias, and is able to give rigorous label complexity bounds for the learning process. Expand
A cost function for similarity-based hierarchical clustering
  • S. Dasgupta
  • Computer Science, Mathematics
  • STOC
  • 16 October 2015
TLDR
A simple cost function on hierarchies over a set of points, given pairwise similarities between those points, is introduced and it is shown that this criterion behaves sensibly in canonical instances and that it admits a top-down construction procedure with a provably good approximation ratio. Expand
A General Agnostic Active Learning Algorithm
TLDR
This work presents an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner and provides a fall-back guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity. Expand
Experiments with Random Projection
  • S. Dasgupta
  • Computer Science, Mathematics
  • UAI
  • 30 June 2000
TLDR
Results of random projection as a promising dimensionality reduction technique for learning mixtures of Gaussians are summarized by a wide variety of experiments on synthetic and real data. Expand
...
1
2
3
4
5
...