The multi-armed bandit problem with covariates

@article{Perchet2011TheMB,
  title={The multi-armed bandit problem with covariates},
  author={Vianney Perchet and Philippe Rigollet},
  journal={ArXiv},
  year={2011},
  volume={abs/1110.6084}
}
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a… Expand

Paper Mentions

Asymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems
This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies someExpand
A non-parametric solution to the multi-armed bandit problem with covariates
Abstract In recent years, the multi-armed bandit problem regains popularity especially for the case with covariates since it has new applications in customized services such as personalized medicine.Expand
Stochastic Multi-Armed Bandits with Control Variates
TLDR
This paper studies a new variant of the stochastic multi-armed bandits problem, where the learner has access to auxiliary information about the arms, and develops an algorithm named UCB-CV that uses improved estimates and characterize the regret bounds in terms of the correlation between the rewards and control variates. Expand
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand
Be Greedy in Multi-Armed Bandits
TLDR
It is subversively claimed that for many interesting problems and associated horizons, the best compromise between theoretical guarantees, practical performances and computational burden is definitely to follow the greedy heuristic. Expand
Smoothness-Adaptive Stochastic Bandits
TLDR
This work develops a procedure that infers a global smoothness parameter of the payoff functions based on collected observations, and establishes that this procedure achieves rate-optimal performance up to logarithmic factors. Expand
Kernel Estimation and Model Combination in A Bandit Problem with Covariates
TLDR
This work considers a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of allocation rules. Expand
Nonparametric Stochastic Contextual Bandits
TLDR
The algorithm provides an extension to infinite-armed contextual bandits and recovers topological structures within the context space based on expected bandit performance and gives global intrinsic dimension dependent and ambient dimension independent regret bounds. Expand
Batched Multi-armed Bandits Problem
TLDR
The BaSE (batched successive elimination) policy is proposed to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner. Expand
Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands
  • Sabyasachi Chatterjee, Subhabrata Sen
  • Mathematics
  • 2021
In this paper we initiate a study of non parametric contextual bandits under shape constraints on the mean reward function. Specifically, we study a setting where the context is one dimensional, andExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Linear Response Two–Armed Bandits
We consider a two–armed bandit problem which involves sequential sampling from two non-homogenous populations. The response in each is determined by a random covariate vector and a vector ofExpand
Nonparametric Bandits with Covariates
TLDR
This work derives general lower bounds on the performance of any admissible policy, and develops an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. Expand
RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and theExpand
A Note on Performance Limitations in Bandit Problems With Side Information
TLDR
A minimax lower bound is derived that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate X, no policy can improve on the best achievable performance in the traditional bandit problem without side information. Expand
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
TLDR
A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
TLDR
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials. Expand
Bandit problems with side observations
TLDR
An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull and how much the additional information helps is quantified. Expand
Gap-free Bounds for Stochastic Multi-Armed Bandit
We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirrorExpand
Woodroofe's One-Armed Bandit Problem Revisited
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799-806], which involves sequential sampling from two populations: one whose characteristics are known, andExpand
...
1
2
3
4
...