Dynamic allocation policies for the finite horizon one armed bandit problem

@inproceedings{Burnetas2011DynamicAP,
  title={Dynamic allocation policies for the finite horizon one armed bandit problem},
  author={Apostolos Burnetas and Michael N. Katehakis},
  year={2011}
}
The unknown performance of a new experiment is to be evaluated and compared with that of an existing one over a finite horizon. The explicit structure of an optimal sequential allocation policy is obtained under pertinent reward/loss functions, when the experiments are characterized by random variables with distributions from the one parameter exponentzal famzly. 1. I N T R O D U C T I O N . We consider the following version of a classical problem of dynamic allocation of effort among different… CONTINUE READING