PAC Identification of a Bandit Arm Relative to a Reward Quantile

@inproceedings{Chaudhuri2017PACIO,
  title={PAC Identification of a Bandit Arm Relative to a Reward Quantile},
  author={Arghya Roy Chaudhuri and Shivaram Kalyanakrishnan},
  booktitle={AAAI},
  year={2017}
}
We propose a PAC formulation for identifying an arm in an n-armed bandit whose mean is within a fixed tolerance of the m highest mean. This setup generalises a previous formulation with m = 1, and differs from yet another one which requires m such arms to be identified. The key implication of our proposed approach is the ability to derive upper bounds on the sample complexity that depend on n/m in place of n. Consequently, even when the number of arms is infinite, we only need a finite number… CONTINUE READING

Similar Papers

Loading similar papers…