Combinatorial Sleeping Bandits with Fairness Constraints

@article{Li2019CombinatorialSB,
  title={Combinatorial Sleeping Bandits with Fairness Constraints},
  author={Fengjiao Li and Jia Liu and Bo Ji},
  journal={IEEE INFOCOM 2019 - IEEE Conference on Computer Communications},
  year={2019},
  pages={1702-1710}
}
  • Fengjiao Li, Jia Liu, Bo Ji
  • Published 15 January 2019
  • Computer Science
  • IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
The multi-armed bandit (MAB) model has been widely adopted for studying many practical optimization problems (network resource allocation, ad placement, crowdsourcing, etc.) with unknown parameters. [] Key Method By carefully integrating these two techniques, we develop a new algorithm, called Learning with Fairness Guarantee (LFG), for the CSMAB-F problem. Further, we rigorously prove that not only LFG is feasibility-optimal but it also has a time-average regret upper bounded by $\displaystyle \frac {N}{2…

Figures and Tables from this paper

Combinatorial Sleeping Bandits With Fairness Constraints
TLDR
A new algorithm, called Learning with Fairness Guarantee (LFG), is developed for the CSMAB-F problem, which is rigorously proved that not only LFG is feasibility-optimal, but it also has a time-average regret.
Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
TLDR
This paper adopts a new approach that combines online convex optimization with bandit methods to design selection algorithms and manages to achieve a sublinear regret bound with probability guarantees.
A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints
TLDR
This paper presents a new algorithm called Fair Upper Confidence Bound with Exploration (Fair-UCBe) algorithm for solving a slowly varying stochastic k-armed bandit problem, and is the first fair algorithm with a sublinear regret bound applicable to non-stationary bandits to the best of the authors' knowledge.
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints
TLDR
It is proved TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2\eta} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, which is the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems.
Achieving Fairness in the Stochastic Multi-armed Bandit Problem
TLDR
A fairness-aware regret is defined that takes into account the above fairness constraints and naturally extends the conventional notion of regret, called r-Regret, that holds uniformly over time irrespective of the choice of the learning algorithm.
Combinatorial Sleeping Bandits with Fairness Constraints and Long-Term Non-Availability of Arms
TLDR
The algorithm proposed in this paper deals with the situation of long term non-availability of arms in combinatorial sleeping bandits problem and still maintain the regret bounds along with the queue fairness constraints, and a better way of estimating the fairness that takes into account the longterm non- availability of arms is proposed.
Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits
TLDR
A fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions is provided, and it is shown that the samples of the proposed three algorithms match that of the lower bound in the sense of $\log \frac{1}{\delta }$.
Federated Learning with Fair Worker Selection: A Multi-Round Submodular Maximization Approach
  • Fengjiao Li, Jia Liu, Bo Ji
  • Computer Science, Business
    2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS)
  • 2021
TLDR
Three algorithms are proposed that satisfy the fairness requirement of fair worker selection in Federated Learning systems by giving a higher priority to fairness, and FairDG ensures a stronger short-term fairness guarantee, which holds in every round.
Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees
TLDR
A Fairness-aware regret is defined that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way, and shows that logarithmic regret can be achieved while (almost) satisfying the fairness requirements.
Fairer LP-based Online Allocation
TLDR
A fair algorithm that uses an interior-point LP solver and dynamically detects unfair resource spending is proposed that can control cumulative unfairness on the scale of order O(log(T )), while maintaining the regret to be bounded without dependency on T .
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Combinatorial Sleeping Bandits With Fairness Constraints
TLDR
A new algorithm, called Learning with Fairness Guarantee (LFG), is developed for the CSMAB-F problem, which is rigorously proved that not only LFG is feasibility-optimal, but it also has a time-average regret.
Achieving Fairness in the Stochastic Multi-armed Bandit Problem
TLDR
A fairness-aware regret is defined that takes into account the above fairness constraints and naturally extends the conventional notion of regret, called r-Regret, that holds uniformly over time irrespective of the choice of the learning algorithm.
Bandits with Knapsacks
TLDR
This work presents two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates that is optimal up to polylogarithmic factors.
Fair Task Allocation in Crowdsourced Delivery
TLDR
This work introduces a new assignment strategy for crowdsourced delivery tasks that takes fairness towards workers into consideration, while maximizing the task allocation ratio, and presents both offline and online versions of the proposed algorithm, F-Aware.
Combinatorial Multi-Armed Bandit with General Reward Functions
TLDR
A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.
Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations
TLDR
New efficient policies are shown to achieve regret that grows logarithmically with time, and polynomially in the number of unknown variables, for this combinatorial multi-armed bandit problem.
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards
TLDR
A lower bound is provided for the regret associated with any uniformly good scheme, and a scheme which attains the lower bound for every configuration C is constructed, given explicitly in terms of the Kullback-Liebler number between pairs of distributions.
Fairness in Learning: Classic and Contextual Bandits
TLDR
A tight connection between fairness and the KWIK (Knows What It Knows) learning model is proved: a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and a worst-case exponential gap in regret between fair and non-fair learning algorithms.
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
...
1
2
3
4
...