# Combinatorial Sleeping Bandits with Fairness Constraints

@article{Li2019CombinatorialSB, title={Combinatorial Sleeping Bandits with Fairness Constraints}, author={Fengjiao Li and Jia Liu and Bo Ji}, journal={IEEE INFOCOM 2019 - IEEE Conference on Computer Communications}, year={2019}, pages={1702-1710} }

The multi-armed bandit (MAB) model has been widely adopted for studying many practical optimization problems (network resource allocation, ad placement, crowdsourcing, etc.) with unknown parameters. [... ] Key Method By carefully integrating these two techniques, we develop a new algorithm, called Learning with Fairness Guarantee (LFG), for the CSMAB-F problem. Further, we rigorously prove that not only LFG is feasibility-optimal but it also has a time-average regret upper bounded by $\displaystyle \frac {N}{2… Expand

## 65 Citations

Combinatorial Sleeping Bandits With Fairness Constraints

- Computer ScienceIEEE Transactions on Network Science and Engineering
- 2020

A new algorithm, called Learning with Fairness Guarantee (LFG), is developed for the CSMAB-F problem, which is rigorously proved that not only LFG is feasibility-optimal, but it also has a time-average regret.

Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

- Computer ScienceIJCAI
- 2020

This paper adopts a new approach that combines online convex optimization with bandit methods to design selection algorithms and manages to achieve a sublinear regret bound with probability guarantees.

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

- Computer ScienceArXiv
- 2020

This paper presents a new algorithm called Fair Upper Confidence Bound with Exploration (Fair-UCBe) algorithm for solving a slowly varying stochastic k-armed bandit problem, and is the first fair algorithm with a sublinear regret bound applicable to non-stationary bandits to the best of the authors' knowledge.

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

- Computer ScienceArXiv
- 2020

It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.

Achieving Fairness in the Stochastic Multi-armed Bandit Problem

- Computer ScienceAAAI
- 2020

A fairness-aware regret is defined that takes into account the above fairness constraints and naturally extends the conventional notion of regret, called r-Regret, that holds uniformly over time irrespective of the choice of the learning algorithm.

Combinatorial Sleeping Bandits with Fairness Constraints and Long-Term Non-Availability of Arms

- Computer Science2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)
- 2020

The algorithm proposed in this paper deals with the situation of long term non-availability of arms in combinatorial sleeping bandits problem and still maintain the regret bounds along with the queue fairness constraints, and a better way of estimating the fairness that takes into account the longterm non- availability of arms is proposed.

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

- Computer ScienceIEEE INFOCOM 2020 - IEEE Conference on Computer Communications
- 2020

A fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions is provided, and it is shown that the samples of the proposed three algorithms match that of the lower bound in the sense of $\log \frac{1}{\delta }$.

Federated Learning with Fair Worker Selection: A Multi-Round Submodular Maximization Approach

- Computer Science, Business2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS)
- 2021

Three algorithms are proposed that satisfy the fairness requirement of fair worker selection in Federated Learning systems by giving a higher priority to fairness, and FairDG ensures a stronger short-term fairness guarantee, which holds in every round.

Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees

- Computer ScienceArXiv
- 2019

A Fairness-aware regret is defined that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way, and shows that logarithmic regret can be achieved while (almost) satisfying the fairness requirements.

Fairer LP-based Online Allocation

- Computer ScienceArXiv
- 2021

A fair algorithm that uses an interior-point LP solver and dynamically detects unfair resource spending is proposed that can control cumulative unfairness on the scale of order O(log(T )), while maintaining the regret to be bounded without dependency on T .

## References

SHOWING 1-10 OF 37 REFERENCES

Combinatorial Sleeping Bandits With Fairness Constraints

- Computer ScienceIEEE Transactions on Network Science and Engineering
- 2020

A new algorithm, called Learning with Fairness Guarantee (LFG), is developed for the CSMAB-F problem, which is rigorously proved that not only LFG is feasibility-optimal, but it also has a time-average regret.

Achieving Fairness in the Stochastic Multi-armed Bandit Problem

- Computer ScienceAAAI
- 2020

A fairness-aware regret is defined that takes into account the above fairness constraints and naturally extends the conventional notion of regret, called r-Regret, that holds uniformly over time irrespective of the choice of the learning algorithm.

Bandits with Knapsacks

- Computer Science2013 IEEE 54th Annual Symposium on Foundations of Computer Science
- 2013

This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.

Fair Task Allocation in Crowdsourced Delivery

- Computer ScienceIEEE Transactions on Services Computing
- 2021

This work introduces a new assignment strategy for crowdsourced delivery tasks that takes fairness towards workers into consideration, while maximizing the task allocation ratio, and presents both offline and online versions of the proposed algorithm, F-Aware.

Combinatorial Multi-Armed Bandit with General Reward Functions

- Computer Science, MathematicsNIPS
- 2016

A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.

Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations

- Computer Science, MathematicsIEEE/ACM Transactions on Networking
- 2012

New efficient policies are shown to achieve regret that grows logarithmically with time, and polynomially in the number of unknown variables, for this combinatorial multi-armed bandit problem.

Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards

- Mathematics, Computer Science
- 1987

A lower bound is provided for the regret associated with any uniformly good scheme, and a scheme which attains the lower bound for every configuration C is constructed, given explicitly in terms of the Kullback-Liebler number between pairs of distributions.

Fairness in Learning: Classic and Contextual Bandits

- Computer ScienceNIPS
- 2016

A tight connection between fairness and the KWIK (Knows What It Knows) learning model is proved: a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and a worst-case exponential gap in regret between fair and non-fair learning algorithms.

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

- Economics, Computer ScienceFound. Trends Mach. Learn.
- 2012

The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.

Finite-time Analysis of the Multiarmed Bandit Problem

- Computer ScienceMachine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.