• Publications
  • Influence
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
TLDR
We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. Expand
  • 159
  • 22
  • PDF
Linear Thompson Sampling Revisited
TLDR
We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. Expand
  • 80
  • 18
  • PDF
Risk-Aversion in Multi-armed Bandits
TLDR
We introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. Expand
  • 73
  • 15
  • PDF
Finite-sample analysis of least-squares policy iteration
TLDR
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. Expand
  • 85
  • 13
  • PDF
Online Stochastic Optimization under Correlated Bandit Feedback
TLDR
We introduce the high-confidence tree (HCT) algorithm, a novel anytime χ-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. Expand
  • 34
  • 13
  • PDF
Analysis of a Classification-based Policy Iteration Algorithm
TLDR
We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Expand
  • 85
  • 11
  • PDF
Best-Arm Identification in Linear Bandits
TLDR
We study the best-arm identification problem in linear bandit where the rewards of the arms depend linearly on an unknown parameter θ* and the objective is to return the arm with the largest reward. Expand
  • 55
  • 10
  • PDF
LSTD with Random Projections
TLDR
We study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a high-dimensional space. Expand
  • 49
  • 10
  • PDF
Transfer of samples in batch reinforcement learning
TLDR
We introduce a novel algorithm that transfers samples (i.e., tuples ⟨s, a, s', r⟩) from source to target tasks that are mostly similar to the target task, and use them as input for batch reinforcement-learning algorithms. Expand
  • 109
  • 9
  • PDF
Transfer in Reinforcement Learning: A Framework and a Survey
  • A. Lazaric
  • Computer Science
  • Reinforcement Learning
  • 2012
TLDR
We provide a formalization of the general transfer problem, we identify the main settings which have been investigated so far, and we review the most important approaches to transfer in reinforcement learning. Expand
  • 141
  • 8
  • PDF