# Budget-Constrained Bandits over General Cost and Reward Distributions

@inproceedings{Cayci2020BudgetConstrainedBO, title={Budget-Constrained Bandits over General Cost and Reward Distributions}, author={Semih Cayci and Atilla Eryilmaz and Rayadurgam Srikant}, booktitle={AISTATS}, year={2020} }

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward…

## Figures and Topics from this paper

## 11 Citations

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

- Mathematics, Computer ScienceArXiv
- 2021

The algorithm is based on the primal-dual approach in optimization, and includes two components: the primal component is similar to unconstrained stochastic linear bandits, and the dual component depends on the number of constraints.

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

- Computer Science
- 2021

The algorithm is based on the primal-dual approach in optimization and includes two components: the primal component is similar to unconstrained stochastic linear bandits and the dual component depends on the number of constraints but is independent of the sizes of the contextual space, the action space, and the feature space.

Continuous-Time Multi-Armed Bandits with Controlled Restarts

- Computer Science, MathematicsArXiv
- 2020

This work investigates the bandit problem with controlled restarts for time-constrained decision processes, and develops provably good learning algorithms for efficient online learning algorithms with finite and continuous action space of restart strategies.

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

- 2021

is the number of constraints, d is the dimension of the reward feature space, and δ is a Slater’s constant; and zero constraint violation in any round τ ą τ 1, where τ 1 is independent of horizon T.…

Making the most of your day: online learning for optimal allocation of time

- Mathematics, Computer ScienceArXiv
- 2021

Online learning for optimal allocation when the resource to be allocated is time and the regret incurred by the agent is studied, first when she knows her reward function but does not know the distribution of the task duration, and then when she does notknow her reward functions.

Fast and Accurate Online Decision-Making

- 2021

We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive…

POND: Pessimistic-Optimistic oNline Dispatch

- Computer ScienceArXiv
- 2020

A novel online dispatch algorithm, named POND, standing for Pessimistic-Optimistic oNline Dispatch, which achieves high regret and constraint violation, and experiments show that POND achieves low regret with minimal constraint violations.

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

- Computer Science, MathematicsArXiv
- 2021

A novel low-complexity algorithm based on Lyapunov optimization methodology, named LyOn, is proposed and it is proved that it achieves O( √ B logB) regret and O(logB/B) constraint-violation.

Group-Fair Online Allocation in Continuous Time

- Computer Science, MathematicsNeurIPS
- 2020

This work proposes a novel online learning algorithm based on dual ascent optimization for time averages, and proves that it achieves $\tilde{O}(B^{-1/2})$ regret bound.

## References

SHOWING 1-10 OF 37 REFERENCES

Multi-Armed Bandit with Budget Constraint and Variable Costs

- Computer ScienceAAAI
- 2013

It is shown that when applying the proposed algorithms to a previous setting with fixed costs, one can improve the previously obtained regret bound, and results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with the theoretical analysis.

Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms

- Computer ScienceSIGMETRICS 2015
- 2015

Numerical experiments suggest that B-KL-UCB has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.

Thompson Sampling for Budgeted Multi-Armed Bandits

- Computer Science, MathematicsIJCAI
- 2015

This paper extends the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget, and proves that the distribution-dependent regret bound of this algorithm is O(lnB), where B denotes the budget.

Linear Contextual Bandits with Knapsacks

- Computer Science, MathematicsNIPS
- 2016

This work combines techniques from the work on linContextual, BwK, and OSPP in a nontrivial manner while also tackling new difficulties that are not present in any of these special cases.

Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

- Computer ScienceAAAI
- 2012

Two pulling policies are developed, namely: (i) KUBE; and (ii) fractional KUBe, which are computationally less expensive and prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal.

Budgeted Bandit Problems with Continuous Random Costs

- Computer ScienceACML
- 2015

This work proposes an upper condence bound based algorithms for multi-armed bandits and a condence ball based algorithm for linear bandits, and proves logarithmic regret bounds for both algorithms.

Bandits with concave rewards and convex knapsacks

- Mathematics, Computer ScienceEC
- 2014

A very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon is considered.

Multi-armed Bandits with Metric Switching Costs

- Mathematics, Computer ScienceICALP
- 2009

A general duality-based framework is developed to provide the first O (1) approximation for metric switching costs; the actual constants being quite small.

Multi-armed bandit problems with heavy-tailed reward distributions

- Computer Science, Mathematics2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2011

An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies and it is shown that when the moment-generating functions of the arm reward distributions are properly bounded, the optimal logarithmic order of the regret can be achieved by DSEE.

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

- Computer Science, MathematicsTheor. Comput. Sci.
- 2009

A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered, providing the first analysis of the expected regret for such algorithms.