• Corpus ID: 10960550

Dynamic Ad Allocation: Bandits with Budgets

@article{Slivkins2013DynamicAA,
  title={Dynamic Ad Allocation: Bandits with Budgets},
  author={Aleksandrs Slivkins},
  journal={ArXiv},
  year={2013},
  volume={abs/1306.0155}
}
We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities). We focus on an important practical issue that advertisers are constrained in how much money they can spend on their ad campaigns. This issue has not been considered in the prior work on bandit-based approaches for ad allocation, to the best of our knowledge. We define a simple, stylized model where an… 

Disposable Linear Bandits for Online Recommendations

The regret for this problem is characterized by a previously-unstudied function of the reward distribution among optimal arms, and the upper bound relies on an optimism-based policy which, while computationally intractable, lends itself to approximation via a fast alternating heuristic initialized with a classic similarity score.

Bandits with budgets

This work derives regret bounds on the expected reward in such a bandit problem using a modification of the well-known upper confidence bound algorithm UCB1.

Profit maximization through budget allocation in display advertising

Online display advertising provides advertisers a unique opportunity to calculate real-time return on investment for advertising campaigns. Based on the target audiences, each advertising campaign is

On Logarithmic Regret for Bandits with Knapsacks

A new algorithm with regret in the form of O(n log $T$ /Δ) (Δis the gap of rewards similar to that in standard MAB), which to the knowledge, is of the lowest order till now, and has the same order as the standard M AB problem when d = 1.

Contextual Blocking Bandits

A UCB-based variant of the full-information algorithm is proposed that guarantees a $\mathcal{O}(\log T)$-regret w.r.t. an $\alpha$-optimal strategy in $T$ time steps, matching the $\Omega(\log(T))$ regret lower bound in this setting.

Low regret bounds for Bandits with Knapsacks

A general purpose algorithm is designed which is shown to enjoy asymptotically optimal regret bounds in several cases that encompass many practical applications including dynamic pricing with limited supply and online bidding in ad auctions.

Logarithmic regret bounds for Bandits with Knapsacks

This work designs a general-purpose algorithm with distribution-dependent regret bounds that are logarithmic in the initial endowments of resources in several important cases that cover many practical applications, including dynamic pricing with limited supply, bid optimization in online advertisement auctions, and dynamic procurement.

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

A hierarchical adaptive contextual bandit method (HATCH) is proposed to conduct the policy learning of contextual bandits with a budget constraint and it is proved that HATCH achieves a regret bound as low as .

Resourceful Contextual Bandits

This work designs the first algorithm for solving contextual bandits with ancillary constraints on resources that handles constrained resources other than time, and improves over a trivial reduction to the non-contextual case.

Adversarial Bandits with Knapsacks

This work proposes a new algorithm for the stochastic version of Bandits with Knapsacks, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.

References

SHOWING 1-10 OF 33 REFERENCES

Characterizing Truthful Multi-armed Bandit Mechanisms

This work considers a multiround auction setting motivated by pay-per-click auctions for Internet advertising, and investigates how the design of multi-armed bandit algorithms is affected by the difference in social welfare.

Bandits with Knapsacks

This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.

Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

Two pulling policies areveloped, namely: (i) KUBE; and (ii) fractional KUBe, and logarithmicupper bounds for the regret of both policies are proved, which are asymptotically optimal.

ǫ – First Policies for Budget – Limited Multi-Armed Bandits Long

We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’s actions are cos tly and constrained by a fixed budget that is incommensurable with the rewards

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.

Learning on a budget: posted price mechanisms for online procurement

This work presents a constant-competitive posted price mechanism when agents are identically distributed and the buyer has a symmetric submodular utility function and gives a truthful mechanism that is O(1)-competitive but uses bidding rather than posted pricing.

Contextual Bandits with Similarity Information

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

Algorithms for Infinitely Many-Armed Bandits

A stochastic assumption is made on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm and algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms are described and provided on the resulting expected regret.

Dynamic Pricing with Limited Supply

This work presents a detail-free online posted-price mechanism whose revenue is at most O((k log n)2/3) less than the offline benchmark, for every distribution that is regular, and proves a matching lower bound.

The price of truthfulness for pay-per-click auctions

This work sharply characterizes what regret is achievable, under a truthful restriction, and shows that this truthful restriction imposes statistical limits on the achievable regret.