• Publications
  • Influence
The kernel recursive least-squares algorithm
TLDR
We present a nonlinear version of the recursive least squares (RLS) algorithm that can be used to recursively construct minimum mean-squared-error solutions to nonlinear least-squares problems that are frequently encountered in signal processing applications. Expand
A Tutorial on the Cross-Entropy Method
TLDR
The cross-entropy method is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. Expand
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
TLDR
We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. Expand
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
TLDR
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. Expand
Reinforcement learning with Gaussian processes
TLDR
We present a new generative model for the value function, deduced from its relation with the discounted return. Expand
Robustness and generalization
TLDR
We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar” to a training sample, then the testing error is close to the training error. Expand
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
TLDR
We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model, and derive a matching lower bound that holds for any sampling policy. Expand
Efficiency loss in a network resource allocation game: the case of elastic supply
TLDR
We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total rate sent through the link. Expand
Robustness and Regularization of Support Vector Machines
TLDR
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. Expand
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
TLDR
In this paper, we present a set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question. Expand
...
1
2
3
4
5
...