Non-stationary Bandits with Knapsacks

  title={Non-stationary Bandits with Knapsacks},
  author={Shang Liu and Jiashuo Jiang and Xiaocheng Li},
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some… 

Figures from this paper


Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
This paper fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by established a connection between the adversarial and the stochastic MAB frameworks.
Adversarial Bandits with Knapsacks
This work proposes a new algorithm for the stochastic version of Bandits with Knapsacks, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.
Bandits with Knapsacks
This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.
Regret Bounds for Generalized Linear Bandits under Parameter Drift
This work introduces a new algorithm that addresses central mechanisms inherited from the Linear Bandit setting by explicitly splitting the treatment of the learning and tracking aspects of the problem, and proves that under a geometric assumption on the action set, this approach enjoys a regret bound.
Unifying the stochastic and the adversarial Bandits with Knapsack
This paper proposes EXP3.BwK, a novel algorithm that achieves order optimal regret in the adversarial BwK setup, and incurs an almost optimal expected regret with an additional factor of $\log(B)$ in the stochastic B wK setup.
Learning to Optimize under Non-Stationarity
Algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting and how the difficulty posed by the non- stationarity can be overcome by a novel marriage between stochastics and adversarial bandits learning algorithms are shown.
Regret and Cumulative Constraint Violation Analysis for Online Convex Optimization with Long Term Constraints
This paper considers online convex optimization with long term constraints, where constraints can be violated in intermediate rounds, but need to be satisfied in the long run to achieve the optimal regret with respect to any comparator sequence.
The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks
This paper studies the bandits with knapsacks (BwK) problem and develops a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound, believed to be the best of its knowledge.
Online Stochastic Optimization with Wasserstein Based Non-stationarity
This paper proposes a new Wasserstein-distance based measure to measure the non-stationarity of the distributions at different time periods and shows that this measure leads to a necessary and sufficient condition for the attainability of a sublinear regret.
Bandits with Knapsacks beyond the Worst Case
A general “reduction" is provided from BwK to bandits which takes advantage of some known helpful structure, and applies this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.