Corpus ID: 1658296

An Asymptotically Optimal UCB Policy for Uniform Bandits of Unknown Support

@article{Cowan2015AnAO,
  title={An Asymptotically Optimal UCB Policy for Uniform Bandits of Unknown Support},
  author={Wesley Cowan and M. Katehakis},
  journal={ArXiv},
  year={2015},
  volume={abs/1505.01918}
}
Consider the problem of sampling sequentially from a finite n umber of N > 2 populations, specified by random variables X i k , i = 1,..., N, and k = 1, 2,...; where X i k denotes the outcome from population i the k th time it is sampled. It is assumed that for each fixed i, {X i k }k>1 is a sequence of i.i.d. uniform random variables over some interval [ai, bi], with the support (i.e., ai, bi) unknown to the controller. The objective is to have a policy π for deciding from which of the N… Expand
Normal Bandits of Unknown Means and Variances
ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT
Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret
EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET
OPTIMAL DATA UTILIZATION FOR GOAL-ORIENTED LEARNING
A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis
Concentration of Measure
...
1
2
3
...

References

SHOWING 1-10 OF 53 REFERENCES
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
An asymptotically optimal policy for finite support models in the multiarmed bandit problem
Analysis of Thompson Sampling for the Multi-armed Bandit Problem
The Multi-Armed Bandit Problem: Decomposition and Computation
MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT
Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret
...
1
2
3
4
5
...