# To update or not to update? Delayed nonparametric bandits with randomized allocation

@article{Arya2020ToUO,
title={To update or not to update? Delayed nonparametric bandits with randomized allocation},
author={Sakshi Arya and Yuhong Yang},
journal={Stat},
year={2020},
volume={10}
}
• Published 26 May 2020
• Computer Science
• Stat
Delayed rewards problem in contextual bandits has been of interest in various practical settings. We study randomized allocation strategies and provide an understanding on how the exploration–exploitation trade‐off is affected by delays in observing the rewards. In randomized strategies, the extent of exploration–exploitation is controlled by a user‐determined exploration probability sequence. In the presence of delayed rewards, one may choose between using the original exploration sequence…

## References

SHOWING 1-10 OF 32 REFERENCES

### Kernel Estimation and Model Combination in A Bandit Problem with Covariates

• Computer Science
J. Mach. Learn. Res.
• 2016
This work considers a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of allocation rules.

### Bandits with Delayed, Aggregated Anonymous Feedback

• Computer Science
ICML
• 2018
An algorithm is provided that matches the worst case regret of the non-anonymous problem exactly when the delays are bounded, and up to logarithmic factors or an additive variance term for unbounded delays.

### Bandits with Delayed Anonymous Feedback

• Computer Science, Mathematics
ArXiv
• 2017
It is demonstrated it is still possible to achieve logarithmic regret, but with additional lower order terms, and provided an algorithm with regret O(log(T ) + √ g( τ) log(T) + g(τ)) where g(σ) is some function of the delay distribution.

### RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES

• Mathematics
• 2002
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the

### Learning in Generalized Linear Contextual Bandits with Stochastic Delays

• Computer Science
NeurIPS
• 2019
This paper designs a delay-adaptive algorithm, which is called Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establishes regret bounds under various delay assumptions and contributes to the broad landscape of contextual bandits literature.

### Stochastic Bandits with Delay-Dependent Payoffs

• Computer Science
AISTATS
• 2020
A nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled is proposed and an algorithm whose regret with respect to the best ranking policy is bounded by $\widetilde{\mathcal{O}}\big(\!\sqrt{kT}\big)$.

### The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits

• Computer Science
AAAI
• 2015
The Stochastic Delayed Bandits algorithm is presented, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees and empirical results show that SDB outperforms state-of-the-art approaches to handling delay, heuristics, prior data, and evaluation.

### Stochastic Bandit Models for Delayed Conversions

• Computer Science
UAI
• 2017
This paper proposes and investigates a new stochas-tic multi-armed bandit model in the framework proposed by Chapelle (2014) --based on empirical studies in the field of web advertising-- in which each action may trigger a future reward that will then happen with a stochAs-tic delay.

### Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

• Economics, Computer Science
Found. Trends Mach. Learn.
• 2012
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.