Thompson Sampling for Combinatorial Semi-Bandits
@article{Wang2018ThompsonSF, title={Thompson Sampling for Combinatorial Semi-Bandits}, author={Siwei Wang and Wei Chen}, journal={ArXiv}, year={2018}, volume={abs/1803.04623} }
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(mK_{\max}\log T / \Delta_{\min})$, where $m$ is the number of arms, $K_{\max}$ is the size of the largest super arm, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any…
87 Citations
When Combinatorial Thompson Sampling meets Approximation Regret
- Computer ScienceArXiv
- 2023
The first $\mathcal{O}(\log(T)/\Delta)$ approximation regret upper bound for CTS is provided, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis.
Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
- Computer ScienceAISTATS
- 2019
This work analyzes the regret of combinatorial Thompson sampling (CTS) for the combinatorially multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting and compares CTS with combinatorsial upper confidence bound (CUCB) via numerical experiments on a cascading bandit problem.
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints
- Computer ScienceArXiv
- 2020
It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.
Thompson Sampling for Cascading Bandits
- Computer ScienceArXiv
- 2018
Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity and the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback.
Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem
- Computer ScienceCOLT
- 2019
A new smoothness criterion is introduced, which is term Gini-weighted smoothness, that takes into account both the nonlinearity of the reward and concentration properties of the arms, and shows that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.
Risk-Aware Algorithms for Combinatorial Semi-Bandits
- Computer ScienceArXiv
- 2021
This work considers the problem of maximizing the Conditional Value-at-Risk (CVaR) of the rewards obtained from the super arms of the combinatorial bandit for the two cases of Gaussian and bounded arm rewards and proposes new algorithms that maximize the CVaR.
Lenient Regret for Multi-Armed Bandits
- Computer ScienceAAAI
- 2021
A new, more lenient, regret criterion is suggested that ignores suboptimality gaps smaller than some ε, and a variant of the Thompson Sampling algorithm, called ε-TS, is presented, and its asymptotic optimality is proved in terms of the lenient regret.
Sleeping Combinatorial Bandits
- Computer ScienceArXiv
- 2021
It is proved — under mild smoothness conditions — that the CS-UCB algorithm achieves an O(log(T )) instance-dependent regret guarantee and it is proved that when the range of the rewards is bounded, the regret guarantee of CS- UCB algorithm is O( √ T log(T );) in a general setting.
Thompson Sampling for Combinatorial Network Optimization in Unknown Environments
- Computer ScienceIEEE/ACM Transactions on Networking
- 2020
This paper considers a very general learning framework called combinatorial multi-armed bandit with probabilistically triggered arms and a very powerful Bayesian algorithm called Combinatorial Thompson Sampling (CTS) and achieves Bayesian regret.
Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
- Computer Science, MathematicsProc. ACM Meas. Anal. Comput. Syst.
- 2021
AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.
References
SHOWING 1-10 OF 38 REFERENCES
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints
- Computer ScienceArXiv
- 2020
It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.
Combinatorial Multi-Armed Bandit with General Reward Functions
- Computer Science, MathematicsNIPS
- 2016
A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
- Computer ScienceICML
- 2015
It is proved that MP-TS for binary rewards has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al. (1987) and is the first computationally efficient algorithm with optimal regret.
Combinatorial multi-armed bandit: general framework, results and applications
- Computer ScienceICML 2013
- 2013
The regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound in a recent paper on combinatorial bandits with linear rewards.
Analysis of Thompson Sampling for the Multi-armed Bandit Problem
- Computer ScienceCOLT
- 2012
For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.
The non-stochastic multi-armed bandit problem
- Computer Science, Economics
- 2001
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.
Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms
- Computer ScienceJ. Mach. Learn. Res.
- 2016
The regret analysis is tight in that it matches the bound of UCB1 algorithm (up to a constant factor) for the classical MAB problem, and it significantly improves the regret bound in an earlier paper on combinatorial bandits with linear rewards.
Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
- Computer ScienceNIPS
- 2017
This work provides lower bound results showing that the factor 1/p* is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.
Finite-time Analysis of the Multiarmed Bandit Problem
- Computer ScienceMachine Learning
- 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Efficient Learning in Large-Scale Combinatorial Semi-Bandits
- Computer ScienceICML
- 2015
This paper considers efficient learning in large-scale combinatorial semi-bandits with linear generalization, and proposes two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and CombLinUCB, which are computationally efficient and provably statistically efficient under reasonable assumptions.