The Self-Normalized Estimator for Counterfactual Learning


This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the… (More)



Citations per Year

Citation Velocity: 15

Averaging 15 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.
  • Presentations referencing similar topics