• Corpus ID: 3881356

Thompson Sampling for Combinatorial Semi-Bandits

@article{Wang2018ThompsonSF,
  title={Thompson Sampling for Combinatorial Semi-Bandits},
  author={Siwei Wang and Wei Chen},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.04623}
}
  • Siwei WangWei Chen
  • Published 13 March 2018
  • Computer Science
  • ArXiv
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(mK_{\max}\log T / \Delta_{\min})$, where $m$ is the number of arms, $K_{\max}$ is the size of the largest super arm, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any… 

Figures from this paper

When Combinatorial Thompson Sampling meets Approximation Regret

The first $\mathcal{O}(\log(T)/\Delta)$ approximation regret upper bound for CTS is provided, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis.

Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

This work analyzes the regret of combinatorial Thompson sampling (CTS) for the combinatorially multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting and compares CTS with combinatorsial upper confidence bound (CUCB) via numerical experiments on a cascading bandit problem.

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.

Thompson Sampling for Cascading Bandits

Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity and the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback.

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

A new smoothness criterion is introduced, which is term Gini-weighted smoothness, that takes into account both the nonlinearity of the reward and concentration properties of the arms, and shows that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

Risk-Aware Algorithms for Combinatorial Semi-Bandits

This work considers the problem of maximizing the Conditional Value-at-Risk (CVaR) of the rewards obtained from the super arms of the combinatorial bandit for the two cases of Gaussian and bounded arm rewards and proposes new algorithms that maximize the CVaR.

Lenient Regret for Multi-Armed Bandits

A new, more lenient, regret criterion is suggested that ignores suboptimality gaps smaller than some ε, and a variant of the Thompson Sampling algorithm, called ε-TS, is presented, and its asymptotic optimality is proved in terms of the lenient regret.

Sleeping Combinatorial Bandits

It is proved — under mild smoothness conditions — that the CS-UCB algorithm achieves an O(log(T )) instance-dependent regret guarantee and it is proved that when the range of the rewards is bounded, the regret guarantee of CS- UCB algorithm is O( √ T log(T );) in a general setting.

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

This paper considers a very general learning framework called combinatorial multi-armed bandit with probabilistically triggered arms and a very powerful Bayesian algorithm called Combinatorial Thompson Sampling (CTS) and achieves Bayesian regret.

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.
...

References

SHOWING 1-10 OF 38 REFERENCES

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.

Combinatorial Multi-Armed Bandit with General Reward Functions

A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

It is proved that MP-TS for binary rewards has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al. (1987) and is the first computationally efficient algorithm with optimal regret.

Combinatorial multi-armed bandit: general framework, results and applications

The regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound in a recent paper on combinatorial bandits with linear rewards.

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.

The non-stochastic multi-armed bandit problem

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.

Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms

The regret analysis is tight in that it matches the bound of UCB1 algorithm (up to a constant factor) for the classical MAB problem, and it significantly improves the regret bound in an earlier paper on combinatorial bandits with linear rewards.

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

This work provides lower bound results showing that the factor 1/p* is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.

Finite-time Analysis of the Multiarmed Bandit Problem

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

This paper considers efficient learning in large-scale combinatorial semi-bandits with linear generalization, and proposes two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and CombLinUCB, which are computationally efficient and provably statistically efficient under reasonable assumptions.