• Publications
  • Influence
The kernel recursive least-squares algorithm
TLDR
A nonlinear version of the recursive least squares (RLS) algorithm that uses a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples.
A Tutorial on the Cross-Entropy Method
TLDR
This tutorial presents the CE methodology, the basic algorithm and its modifications, and discusses applications in combinatorial optimization and machine learning.
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
TLDR
A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
TLDR
The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.
Reinforcement learning with Gaussian processes
TLDR
A SARSA based extension of GPTD is presented, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.
Robustness and generalization
We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar” to a training sample, then the testing error is close to the
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
TLDR
This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.
Reward Constrained Policy Optimization
TLDR
This work presents a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one.
Robustness and Regularization of Support Vector Machines
TLDR
This work considers regularized support vector machines and shows that they are precisely equivalent to a new robust optimization formulation, thus establishing robustness as the reason regularized SVMs generalize well and gives a new proof of consistency of (kernelized) SVMs.
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
TLDR
A set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question are presented and the use of these criteria under different forms of uncertainty for both the rewards and the transitions is studied.
...
...