Corpus ID: 211252543

Linear Bandits with Stochastic Delayed Feedback

  title={Linear Bandits with Stochastic Delayed Feedback},
  author={Claire Vernade and Alexandra Carpentier and Tor Lattimore and Giovanni Zappella and Beyza Ermis and M. Br{\"u}ckner},
  journal={arXiv: Machine Learning},
  • Claire Vernade, Alexandra Carpentier, +3 authors M. Brückner
  • Published 2018
  • Mathematics, Computer Science
  • arXiv: Machine Learning
  • Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase is usually observable some time after the display, the decision of not buying is never explicitly… CONTINUE READING

    Figures and Topics from this paper.


    Publications referenced by this paper.
    Bandits with Delayed Anonymous Feedback
    • 9
    • PDF
    Learning in Generalized Linear Contextual Bandits with Stochastic Delays
    • 10
    • PDF
    Bandit Online Learning with Unknown Delays
    • 11
    • PDF
    Thompson Sampling for Contextual Bandits with Linear Payoffs
    • 428
    • PDF
    Stochastic Bandit Models for Delayed Conversions
    • 23
    • PDF
    Linear Thompson Sampling Revisited
    • 74
    • PDF
    Best arm identification in multi-armed bandits with delayed feedback
    • 74
    • PDF
    Learning from Delayed Outcomes with Intermediate Observations
    • 6
    • PDF
    Explore no more: Improved high-probability regret bounds for non-stochastic bandits
    • 39
    • PDF
    Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
    • 20
    • PDF