• Corpus ID: 10960550

# Dynamic Ad Allocation: Bandits with Budgets

@article{Slivkins2013DynamicAA,
title={Dynamic Ad Allocation: Bandits with Budgets},
author={Aleksandrs Slivkins},
journal={ArXiv},
year={2013},
volume={abs/1306.0155}
}
We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities). We focus on an important practical issue that advertisers are constrained in how much money they can spend on their ad campaigns. This issue has not been considered in the prior work on bandit-based approaches for ad allocation, to the best of our knowledge. We define a simple, stylized model where an…
23 Citations

### Disposable Linear Bandits for Online Recommendations

• Computer Science
AAAI
• 2021
The regret for this problem is characterized by a previously-unstudied function of the reward distribution among optimal arms, and the upper bound relies on an optimism-based policy which, while computationally intractable, lends itself to approximation via a fast alternating heuristic initialized with a classic similarity score.

### Bandits with budgets

• Computer Science
52nd IEEE Conference on Decision and Control
• 2013
This work derives regret bounds on the expected reward in such a bandit problem using a modification of the well-known upper confidence bound algorithm UCB1.

### Profit maximization through budget allocation in display advertising

Online display advertising provides advertisers a unique opportunity to calculate real-time return on investment for advertising campaigns. Based on the target audiences, each advertising campaign is

### On Logarithmic Regret for Bandits with Knapsacks

• Computer Science
2021 55th Annual Conference on Information Sciences and Systems (CISS)
• 2021
A new algorithm with regret in the form of O(n log $T$ /Δ) (Δis the gap of rewards similar to that in standard MAB), which to the knowledge, is of the lowest order till now, and has the same order as the standard M AB problem when d = 1.

### Contextual Blocking Bandits

• Computer Science
AISTATS
• 2021
A UCB-based variant of the full-information algorithm is proposed that guarantees a $\mathcal{O}(\log T)$-regret w.r.t. an $\alpha$-optimal strategy in $T$ time steps, matching the $\Omega(\log(T))$ regret lower bound in this setting.

### Low regret bounds for Bandits with Knapsacks

• Computer Science
ArXiv
• 2015
A general purpose algorithm is designed which is shown to enjoy asymptotically optimal regret bounds in several cases that encompass many practical applications including dynamic pricing with limited supply and online bidding in ad auctions.

### Logarithmic regret bounds for Bandits with Knapsacks

• Computer Science
• 2015
This work designs a general-purpose algorithm with distribution-dependent regret bounds that are logarithmic in the initial endowments of resources in several important cases that cover many practical applications, including dynamic pricing with limited supply, bid optimization in online advertisement auctions, and dynamic procurement.

### Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

• Computer Science
WWW
• 2020
A hierarchical adaptive contextual bandit method (HATCH) is proposed to conduct the policy learning of contextual bandits with a budget constraint and it is proved that HATCH achieves a regret bound as low as .

### Resourceful Contextual Bandits

• Computer Science
COLT
• 2014
This work designs the first algorithm for solving contextual bandits with ancillary constraints on resources that handles constrained resources other than time, and improves over a trivial reduction to the non-contextual case.

### Adversarial Bandits with Knapsacks

• Computer Science
2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
• 2019
This work proposes a new algorithm for the stochastic version of Bandits with Knapsacks, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.

## References

SHOWING 1-10 OF 33 REFERENCES

### Characterizing Truthful Multi-armed Bandit Mechanisms

• Computer Science
SIAM J. Comput.
• 2014
This work considers a multiround auction setting motivated by pay-per-click auctions for Internet advertising, and investigates how the design of multi-armed bandit algorithms is affected by the difference in social welfare.

### Bandits with Knapsacks

• Computer Science
2013 IEEE 54th Annual Symposium on Foundations of Computer Science
• 2013
This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.

### Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

• Computer Science
AAAI
• 2012
Two pulling policies areveloped, namely: (i) KUBE; and (ii) fractional KUBe, and logarithmicupper bounds for the regret of both policies are proved, which are asymptotically optimal.

### ǫ – First Policies for Budget – Limited Multi-Armed Bandits Long

• Computer Science
• 2010
We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’s actions are cos tly and constrained by a fixed budget that is incommensurable with the rewards

### Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

• Economics, Computer Science
Found. Trends Mach. Learn.
• 2012
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.

### Learning on a budget: posted price mechanisms for online procurement

• Economics
EC '12
• 2012
This work presents a constant-competitive posted price mechanism when agents are identically distributed and the buyer has a symmetric submodular utility function and gives a truthful mechanism that is O(1)-competitive but uses bidding rather than posted pricing.

### Contextual Bandits with Similarity Information

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

### Algorithms for Infinitely Many-Armed Bandits

• Computer Science, Mathematics
NIPS
• 2008
A stochastic assumption is made on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm and algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms are described and provided on the resulting expected regret.

### Dynamic Pricing with Limited Supply

• Economics
ACM Trans. Economics and Comput.
• 2015
This work presents a detail-free online posted-price mechanism whose revenue is at most O((k log n)2/3) less than the offline benchmark, for every distribution that is regular, and proves a matching lower bound.

### The price of truthfulness for pay-per-click auctions

• Computer Science, Economics
EC '09
• 2009
This work sharply characterizes what regret is achievable, under a truthful restriction, and shows that this truthful restriction imposes statistical limits on the achievable regret.