Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Tensor decompositions for learning latent variable models
- Anima Anandkumar, Rong Ge, Daniel J. Hsu, S. Kakade, Matus Telgarsky
- Computer Science, MathematicsJ. Mach. Learn. Res.
- 28 October 2012
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
A Natural Policy Gradient
- S. Kakade
- Computer ScienceNIPS
- 3 January 2001
This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
Approximately Optimal Approximate Reinforcement Learning
Stochastic Linear Optimization under Bandit Feedback
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Niranjan Srinivas, Andreas Krause, S. Kakade, M. Seeger
- Computer ScienceIEEE Transactions on Information Theory
- 1 May 2012
This work analyzes an intuitive Gaussian process upper confidence bound algorithm, and bound its cumulative regret in terms of maximal in- formation gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Cover trees for nearest neighbor
A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
How to Escape Saddle Points Efficiently
- Chi Jin, Rong Ge, Praneeth Netrapalli, S. Kakade, Michael I. Jordan
- Computer Science, MathematicsICML
- 2 March 2017
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
On the sample complexity of reinforcement learning.
- S. Kakade
- Computer Science
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.