• Publications
  • Influence
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
TLDR
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Tensor decompositions for learning latent variable models
TLDR
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
A Natural Policy Gradient
TLDR
This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
TLDR
This work analyzes an intuitive Gaussian process upper confidence bound algorithm, and bound its cumulative regret in terms of maximal in- formation gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Cover trees for nearest neighbor
TLDR
A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
How to Escape Saddle Points Efficiently
TLDR
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
On the sample complexity of reinforcement learning.
TLDR
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
TLDR
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.
...
...