Corpus ID: 88514856

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

@article{Cowan2015AsymptoticallyOS,
  title={Asymptotically Optimal Sequential Experimentation Under Generalized Ranking},
  author={Wesley Cowan and M. Katehakis},
  journal={arXiv: Machine Learning},
  year={2015}
}
We consider the \mnk{classical} problem of a controller activating (or sampling) sequentially from a finite number of $N \geq 2$ populations, specified by unknown distributions. Over some time horizon, at each time $n = 1, 2, \ldots$, the controller wishes to select a population to sample, with the goal of sampling from a population that optimizes some "score" function of its distribution, e.g., maximizing the expected sum of outcomes or minimizing variability. We define a class of \textit… Expand
EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET
OPTIMAL DATA UTILIZATION FOR GOAL-ORIENTED LEARNING
Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

References

SHOWING 1-10 OF 48 REFERENCES
Optimal Adaptive Policies for Sequential Allocation Problems
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
An asymptotically optimal policy for finite support models in the multiarmed bandit problem
ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
...
1
2
3
4
5
...