• Corpus ID: 226226546

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism

  title={Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism},
  author={Brijen Thananjeyan and Kirthevasan Kandasamy and Ion Stoica and Michael I. Jordan and Ken Goldberg and Joseph Gonzalez},
  booktitle={International Conference on Machine Learning},
We study exploration in stochastic multi-armed bandits when we have access to a divisible resource, and can allocate varying amounts of this resource to arm pulls. By allocating more resources to a pull, we can compute the outcome faster to inform subsequent decisions about which arms to pull. However, since distributed environments do not scale linearly, executing several arm pulls in parallel, and hence less resources per pull, may result in better throughput. For example, in simulation-based… 

Figures from this paper

PAC Best Arm Identification Under a Deadline

Elastic Batch Racing (EBR) is proposed, a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both hardness results.

BORA: Bayesian Optimization for Resource Allocation

Results on the original SBF case study proposed in the literature and a real-life application empirically prove that BORA is a more efficient and e-ective learning-and-optimization framework than SBF.

New Paradigms for Adaptive Decision Making under Bandit Feedback

Below are a few of the many applications for the bandit framework which have inspired my work.



Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

This work develops GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization, and proves a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B.

Almost Optimal Exploration in Multi-Armed Bandits

Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.

Towards Optimality in Parallel Job Scheduling

It is proved that EQUI, a policy which continuously divides cores evenly across jobs, is optimal when all jobs follow a single speedup curve and have exponentially distributed sizes, and fixed-width policies which use the optimal fixed level of parallelization, k, become near-optimal as the number of cores becomes large.

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline

A dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline, HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work -- trial disposability, progressively identifiable rankings among different configurations, and space-time constraints.

Pure Exploration in Multi-armed Bandits Problems

The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.

Best Arm Identification in Multi-Armed Bandits

This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.