• Publications
  • Influence
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
TLDR
We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Expand
Tensor decompositions for learning latent variable models
TLDR
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments. Expand
A Natural Policy Gradient
TLDR
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Expand
Stochastic Linear Optimization under Bandit Feedback
TLDR
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. Expand
Approximately Optimal Approximate Reinforcement Learning
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
TLDR
We analyze an intuitive Gaussian process upper confidence bound algorithm, and bound its cumulative regret in terms of maximal in- formation gain, establishing a novel connection between GP optimization and experimental design. Expand
Cover trees for nearest neighbor
TLDR
We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). Expand
On the sample complexity of reinforcement learning.
TLDR
This thesis summarizes recent sample complexity results in the reinforcement learning literature and builds on these results to provide novel algorithms with strong performance guarantees. Expand
How to Escape Saddle Points Efficiently
TLDR
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, up to log factors. Expand
Multi-view clustering via canonical correlation analysis
TLDR
Clustering data in high dimensions is believed to be a hard problem in general. Expand
...
1
2
3
4
5
...