#### Filter Results:

- Full text PDF available (80)

#### Publication Year

1968

2017

- This year (4)
- Last 5 years (29)
- Last 10 years (59)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Sanjoy Dasgupta
- FOCS
- 1999

Mixtures of Gaussians are among the most fundamental and widely used statistical models. Current techniques for learning such mixtures from data are local search heuristics with weak performance guarantees. We present the first provably correct algorithm for learning a mixture of Gaussians. This algorithm is very simple and returns the true centers of the… (More)

- Sanjoy Dasgupta
- NIPS
- 2004

We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for… (More)

- Sanjoy Dasgupta, Anupam Gupta
- Random Struct. Algorithms
- 2003

A result of Johnson and Lindenstrauss [13] shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/ )-dimensional Euclidean space such that the distance between any two points changes by only a factor of (1 ). In this note, we prove this theorem using elementary probabilistic techniques. © 2003 Wiley Periodicals, Inc.… (More)

Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a… (More)

- Sanjoy Dasgupta
- NIPS
- 2005

We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the specific target hypothesis, and the desired accuracy.

- Alina Beygelzimer, Sanjoy Dasgupta, John Langford
- ICML
- 2009

We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process.

The Johnson-Lindenstrauss lemma shows that a set of n points in high dimensional Euclidean space can be mapped down into an O(log n== 2) dimensional Euclidean space such that the distance between any two points changes by only a factor of (1). In this note, we prove this lemma using elementary probabilistic techniques.

- Sanjoy Dasgupta
- UAI
- 2000

Recent theoretical work has identified random projection as a promising dimensionality reduction technique for learning mixtures of Gaussians. Here we summarize these results and illustrate them by a wide variety of experiments on synthetic and real data.

- Sanjoy Dasgupta, Daniel J. Hsu, Claire Monteleoni
- ISAIM
- 2007

We present a simple, agnostic active learning algorithm that works for any hypothesis class of bounded VC dimension, and any data distribution. Our algorithm extends a scheme of Cohn, Atlas, and Ladner to the agnostic setting, by (1) reformulating it using a reduction to supervised learning and (2) showing how to apply generalization bounds even for the… (More)

- Michael Collins, Sanjoy Dasgupta, Robert E. Schapire
- NIPS
- 2001

Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a… (More)