• Corpus ID: 211258581

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

@article{Bistritz2020MyFB,
  title={My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits},
  author={Ilai Bistritz and Tavor Z. Baharav and Amir Leshem and Nicholas Bambos},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.09808}
}
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between… 

Figures from this paper

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
TLDR
BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results suggest that this previously ignored connection is worth further investigation.
Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
TLDR
The current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function, and develops an algorithm that achieves a Nash regret of O (√ k log T T T ) , here k denotes the number of arms in the MAB instance.
Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning
TLDR
A survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e. the level of preprocessed data, learning models, and intermediate results.
Decentralized Task Offloading in Edge Computing: A Multi-User Multi-Armed Bandit Approach
TLDR
A multi-user offloading framework considering unknown yet stochastic system-side information to enable a decentralized user-initiated service placement and a decentralized epoch based offloading (DEBO) to optimize user rewards which are subject to the network delay are developed.
Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks
TLDR
A fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks which jointly solves both the channel allocation and access scheduling problems and it is proved that the algorithm has an optimal logarithmic regret.
G T ] 3 0 Ju l 2 02 1 Sequential Blocked Matching
  • Economics, Computer Science
  • 2021
TLDR
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Sequential Blocked Matching
TLDR
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Online Learning for Load Balancing of Unknown Monotone Resource Allocation Games
TLDR
This work proposes a simple algorithm that learns to shift the NE of the game to meet the total load constraints by adjusting the pricing coefficients in an online manner and proves that the algorithm guarantees convergence in L2 to a NE that meets targettotal load constraints.
Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions
TLDR
This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.
One for All and All for One: Distributed Learning of Fair Allocations With Multi-Player Bandits
TLDR
Two distributed algorithms are proposed which learn fair matchings between players and arms while minimizing the regret and show that the first algorithm learns a matching where all players obtain an expected reward of at least their QoS with constant regret.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Distributed Multi-Player Bandits - a Game of Thrones Approach
TLDR
This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).
Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access
TLDR
This paper considers a model where there is no central control and the users cannot communicate with each other, and presents a policy that achieves expected regret of order $O(\log^{2+\delta}{T})$ for some $\delta > 0$.
Social Learning in Multi Agent Multi Armed Bandits
TLDR
A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.
Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks
TLDR
A multi-armed bandit based distributed algorithm for static networks and extend it for the dynamic networks to achieve stable orthogonal allocation (SOC) in finite time with two novel characteristics: low complexity narrowband radio compared to wideband radio in existing works, and Epoch-less approach for dynamic networks.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret
The True Sample Complexity of Identifying Good Arms
We consider two multi-armed bandit problems with $n$ arms: (i) given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and (ii) given a threshold $\mu_0$ and
Competing Bandits in Matching Markets
TLDR
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
TLDR
The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.
Probability Inequalities for sums of Bounded Random Variables
Abstract Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S
...
...