Safe Exploration for Optimizing Contextual Bandits

@article{Jagerman2020SafeEF,
  title={Safe Exploration for Optimizing Contextual Bandits},
  author={R. Jagerman and I. Markov and M. Rijke},
  journal={ACM Transactions on Information Systems (TOIS)},
  year={2020},
  volume={38},
  pages={1 - 23}
}
  • R. Jagerman, I. Markov, M. Rijke
  • Published 2020
  • Computer Science
  • ACM Transactions on Information Systems (TOIS)
  • Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, and so on. However, existing learning methods for contextual bandit problems have one of two drawbacks: They either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal rankings to a user and, thus, may harm the user experience. We introduce a new learning method… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 10 REFERENCES
    Conservative Contextual Linear Bandits
    • 33
    • Highly Influential
    • Open Access
    A contextual-bandit approach to personalized news article recommendation
    • 1,428
    • Highly Influential
    • Open Access
    Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
    • 167
    • Highly Influential
    • Open Access
    Thompson Sampling for Contextual Bandits with Linear Payoffs
    • 427
    • Highly Influential
    • Open Access
    Unbiased Learning-to-Rank with Biased Feedback
    • 114
    • Highly Influential
    • Open Access
    NewsWeeder: Learning to Filter Netnews
    • 1,897
    • Highly Influential
    RCV1: A New Benchmark Collection for Text Categorization Research
    • 2,421
    • Highly Influential
    • Open Access
    High-Confidence Off-Policy Evaluation
    • 116
    • Highly Influential
    • Open Access
    A Database for Handwritten Text Recognition Research
    • 1,395
    • Highly Influential