Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism
@inproceedings{Thananjeyan2020ResourceAI, title={Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism}, author={Brijen Thananjeyan and Kirthevasan Kandasamy and Ion Stoica and Michael I. Jordan and Ken Goldberg and Joseph Gonzalez}, booktitle={International Conference on Machine Learning}, year={2020} }
We study exploration in stochastic multi-armed bandits when we have access to a divisible resource, and can allocate varying amounts of this resource to arm pulls. By allocating more resources to a pull, we can compute the outcome faster to inform subsequent decisions about which arms to pull. However, since distributed environments do not scale linearly, executing several arm pulls in parallel, and hence less resources per pull, may result in better throughput. For example, in simulation-based…
3 Citations
PAC Best Arm Identification Under a Deadline
- Computer ScienceArXiv
- 2021
Elastic Batch Racing (EBR) is proposed, a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both hardness results.
BORA: Bayesian Optimization for Resource Allocation
- Computer ScienceSSRN Electronic Journal
- 2022
Results on the original SBF case study proposed in the literature and a real-life application empirically prove that BORA is a more efficient and e-ective learning-and-optimization framework than SBF.
New Paradigms for Adaptive Decision Making under Bandit Feedback
- Computer Science
- 2021
Below are a few of the many applications for the bandit framework which have inspired my work.
References
SHOWING 1-10 OF 39 REFERENCES
Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
- Computer ScienceICML
- 2012
This work develops GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization, and proves a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B.
Almost Optimal Exploration in Multi-Armed Bandits
- Computer ScienceICML
- 2013
Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.
Towards Optimality in Parallel Job Scheduling
- Computer ScienceSIGMETRICS
- 2018
It is proved that EQUI, a policy which continuously divides cores evenly across jobs, is optimal when all jobs follow a single speedup curve and have exponentially distributed sizes, and fixed-width policies which use the optimal fixed level of parallelization, k, become near-optimal as the number of cores becomes large.
heSRPT: Parallel Scheduling to Minimize Mean Slowdown
- Economics, Computer SciencePerform. Evaluation
- 2020
PAC Subset Selection in Stochastic Multi-armed Bandits
- Computer ScienceICML
- 2012
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
- Computer ScienceCOLT
- 2002
The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.
HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
- Computer ScienceSoCC
- 2019
A dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline, HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work -- trial disposability, progressively identifiable rankings among different configurations, and space-time constraints.
Pure Exploration in Multi-armed Bandits Problems
- Computer ScienceALT
- 2009
The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
- Computer ScienceCOLT
- 2014
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.
Best Arm Identification in Multi-Armed Bandits
- Computer ScienceCOLT
- 2010
This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.