#### Filter Results:

- Full text PDF available (122)

#### Publication Year

1999

2017

- This year (4)
- Last 5 years (51)
- Last 10 years (100)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Brain Region

#### Cell Type

#### Data Set Used

#### Key Phrases

#### Method

Learn More

- Anima Anandkumar, Rong Ge, Daniel J. Hsu, Sham M. Kakade, Matus Telgarsky
- Journal of Machine Learning Research
- 2014

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of secondand third-order). Specifically,… (More)

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence… (More)

- Daniel J. Hsu, Sham M. Kakade, Tong Zhang
- COLT
- 2009

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. Typically, they are learned using search heuristics (such as the Baum-Welch / EM algorithm), which suffer from the usual local optima issues. While in general these models are known to be hard to learn with samples from the… (More)

This thesis is a detailed investigation into the following question: how much data must an agent collect in order to perform “reinforcement learning” successfully? This question is analogous to the classical issue of the sample complexity in supervised learning, but is harder because of the increased realism of the reinforcement learning setting. This… (More)

- Alina Beygelzimer, Sham M. Kakade, John Langford
- ICML
- 2006

We present a tree data structure for fast nearest neighbor operations in general <i>n</i>-point metric spaces (where the data set consists of <i>n</i> points). The data structure requires <i>O</i>(<i>n</i>) space <i>regardless</i> of the metric's structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the… (More)

- Sham M. Kakade
- NIPS
- 2001

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy… (More)

- Daniel J. Hsu, Sham M. Kakade, John Langford, Tong Zhang
- NIPS
- 2009

We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a… (More)

- Anima Anandkumar, Daniel J. Hsu, Sham M. Kakade
- COLT
- 2012

Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high… (More)

- Varsha Dani, Thomas P. Hayes, Sham M. Kakade
- COLT
- 2008

In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R and the cost of each decision… (More)

Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such… (More)