Toward Minimax Off-policy Value Estimation

  title={Toward Minimax Off-policy Value Estimation},
  author={Lihong Li and R{\'e}mi Munos and Csaba Szepesv{\'a}ri},
This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the single-state, or multi-armed bandit case, establish a finite-time minimax risk lower bound, and analyze the risk of three standard estimators. For the so-called regression estimator, we show that while it is asymptotically optimal, for small sample sizes it may perform suboptimally compared to an ideal… CONTINUE READING
Highly Cited
This paper has 28 citations. REVIEW CITATIONS

4 Figures & Tables



Citations per Year

Citation Velocity: 7

Averaging 7 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.