CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits


The multi-armed bandit (MAB) problem is an important model for studying the exploration–exploitation tradeoff in sequential decision making. In this problem, a gambler has to repeatedly choose between a number of slot machine arms to maximize the total payout, where the total number of plays is fixed. Although many methods have been proposed to solve the… (More)
DOI: 10.1007/978-3-319-51691-2_30


4 Figures and Tables

