Doubly Robust Policy Evaluation and Optimization

We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central… (More)

6 Figures & Tables

Topics

Statistics

051015201620172018
Citations per Year

Citation Velocity: 7

Averaging 7 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.