• Corpus ID: 211677262

# Budget-Constrained Bandits over General Cost and Reward Distributions

@inproceedings{Cayci2020BudgetConstrainedBO,
title={Budget-Constrained Bandits over General Cost and Reward Distributions},
author={Semih Cayci and Atilla Eryilmaz and Rayadurgam Srikant},
booktitle={AISTATS},
year={2020}
}
• Published in AISTATS 29 February 2020
• Computer Science, Mathematics
We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward…
11 Citations

## Figures and Topics from this paper

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits
• Mathematics, Computer Science
ArXiv
• 2021
The algorithm is based on the primal-dual approach in optimization, and includes two components: the primal component is similar to unconstrained stochastic linear bandits, and the dual component depends on the number of constraints.
An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints
• Xin Liu, Bin Li
• Computer Science
• 2021
The algorithm is based on the primal-dual approach in optimization and includes two components: the primal component is similar to unconstrained stochastic linear bandits and the dual component depends on the number of constraints but is independent of the sizes of the contextual space, the action space, and the feature space.
Continuous-Time Multi-Armed Bandits with Controlled Restarts
• Computer Science, Mathematics
ArXiv
• 2020
This work investigates the bandit problem with controlled restarts for time-constrained decision processes, and develops provably good learning algorithms for efficient online learning algorithms with finite and continuous action space of restart strategies.
An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints
is the number of constraints, d is the dimension of the reward feature space, and δ is a Slater’s constant; and zero constraint violation in any round τ ą τ 1, where τ 1 is independent of horizon T.
Making the most of your day: online learning for optimal allocation of time
• Mathematics, Computer Science
ArXiv
• 2021
Online learning for optimal allocation when the resource to be allocated is time and the regret incurred by the agent is studied, first when she knows her reward function but does not know the distribution of the task duration, and then when she does notknow her reward functions.
Fast and Accurate Online Decision-Making
• 2021
We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive
POND: Pessimistic-Optimistic oNline Dispatch
• Computer Science
ArXiv
• 2020
A novel online dispatch algorithm, named POND, standing for Pessimistic-Optimistic oNline Dispatch, which achieves high regret and constraint violation, and experiments show that POND achieves low regret with minimal constraint violations.
A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback
• Semih Cayci, Yilin Zheng
• Computer Science, Mathematics
ArXiv
• 2021
A novel low-complexity algorithm based on Lyapunov optimization methodology, named LyOn, is proposed and it is proved that it achieves O( √ B logB) regret and O(logB/B) constraint-violation.
Group-Fair Online Allocation in Continuous Time
• Computer Science, Mathematics
NeurIPS
• 2020
This work proposes a novel online learning algorithm based on dual ascent optimization for time averages, and proves that it achieves $\tilde{O}(B^{-1/2})$ regret bound.

## References

SHOWING 1-10 OF 37 REFERENCES
Multi-Armed Bandit with Budget Constraint and Variable Costs
• Computer Science
AAAI
• 2013
It is shown that when applying the proposed algorithms to a previous setting with fixed costs, one can improve the previously obtained regret bound, and results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with the theoretical analysis.
Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms
• Computer Science
SIGMETRICS 2015
• 2015
Numerical experiments suggest that B-KL-UCB has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.
Thompson Sampling for Budgeted Multi-Armed Bandits
• Computer Science, Mathematics
IJCAI
• 2015
This paper extends the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget, and proves that the distribution-dependent regret bound of this algorithm is O(lnB), where B denotes the budget.
Linear Contextual Bandits with Knapsacks
• Computer Science, Mathematics
NIPS
• 2016
This work combines techniques from the work on linContextual, BwK, and OSPP in a nontrivial manner while also tackling new difficulties that are not present in any of these special cases.
Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits
• Computer Science
AAAI
• 2012
Two pulling policies are developed, namely: (i) KUBE; and (ii) fractional KUBe, which are computationally less expensive and prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal.
Budgeted Bandit Problems with Continuous Random Costs
• Computer Science
ACML
• 2015
This work proposes an upper condence bound based algorithms for multi-armed bandits and a condence ball based algorithm for linear bandits, and proves logarithmic regret bounds for both algorithms.
Bandits with concave rewards and convex knapsacks
• Mathematics, Computer Science
EC
• 2014
A very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon is considered.
Multi-armed Bandits with Metric Switching Costs
• Mathematics, Computer Science
ICALP
• 2009
A general duality-based framework is developed to provide the first O (1) approximation for metric switching costs; the actual constants being quite small.
Multi-armed bandit problems with heavy-tailed reward distributions
• Computer Science, Mathematics
2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
• 2011
An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies and it is shown that when the moment-generating functions of the arm reward distributions are properly bounded, the optimal logarithmic order of the regret can be achieved by DSEE.
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
• Computer Science, Mathematics
Theor. Comput. Sci.
• 2009
A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered, providing the first analysis of the expected regret for such algorithms.