• Publications
  • Influence
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
TLDR
We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. Expand
  • 167
  • 23
  • PDF
Linear Thompson Sampling Revisited
TLDR
We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. Expand
  • 85
  • 18
  • PDF
Risk-Aversion in Multi-armed Bandits
TLDR
We introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. Expand
  • 77
  • 16
  • PDF
Finite-sample analysis of least-squares policy iteration
TLDR
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. Expand
  • 87
  • 13
  • PDF
Online Stochastic Optimization under Correlated Bandit Feedback
TLDR
We introduce the high-confidence tree (HCT) algorithm, a novel anytime χ-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. Expand
  • 36
  • 13
  • PDF
Best-Arm Identification in Linear Bandits
TLDR
We study the best-arm identification problem in linear bandit where the rewards of the arms depend linearly on an unknown parameter θ* and the objective is to return the arm with the largest reward. Expand
  • 63
  • 12
  • PDF
Analysis of a Classification-based Policy Iteration Algorithm
TLDR
We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Expand
  • 86
  • 11
  • PDF
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
TLDR
In this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multi-armed bandit setting. Expand
  • 49
  • 10
  • PDF
LSTD with Random Projections
TLDR
We study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a high-dimensional space. Expand
  • 49
  • 10
  • PDF
Transfer in Reinforcement Learning: A Framework and a Survey
  • A. Lazaric
  • Computer Science
  • Reinforcement Learning
  • 2012
TLDR
We provide a formalization of the general transfer problem, we identify the main settings which have been investigated so far, and we review the most important approaches to transfer in reinforcement learning. Expand
  • 150
  • 9
  • PDF