• Corpus ID: 211258581

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

@article{Bistritz2020MyFB,
title={My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits},
author={Ilai Bistritz and Tavor Z. Baharav and Amir Leshem and Nicholas Bambos},
journal={ArXiv},
year={2020},
volume={abs/2002.09808}
}
• Published 23 February 2020
• Computer Science
• ArXiv
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between…
16 Citations

Figures from this paper

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
• Computer Science
NeurIPS
• 2021
BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results suggest that this previously ignored connection is worth further investigation.
Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
• Computer Science
• 2022
The current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function, and develops an algorithm that achieves a Nash regret of O (√ k log T T T ) , here k denotes the number of arms in the MAB instance.
Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning
• Computer Science
ArXiv
• 2022
A survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e. the level of preprocessed data, learning models, and intermediate results.
Decentralized Task Offloading in Edge Computing: A Multi-User Multi-Armed Bandit Approach
• Computer Science
IEEE INFOCOM 2022 - IEEE Conference on Computer Communications
• 2022
A multi-user offloading framework considering unknown yet stochastic system-side information to enable a decentralized user-initiated service placement and a decentralized epoch based offloading (DEBO) to optimize user rewards which are subject to the network delay are developed.
Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks
• Computer Science
ArXiv
• 2021
A fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks which jointly solves both the channel allocation and access scheduling problems and it is proved that the algorithm has an optimal logarithmic regret.
G T ] 3 0 Ju l 2 02 1 Sequential Blocked Matching
• Economics, Computer Science
• 2021
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Sequential Blocked Matching
• Economics, Computer Science
ArXiv
• 2021
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Online Learning for Load Balancing of Unknown Monotone Resource Allocation Games
• Computer Science
ICML
• 2021
This work proposes a simple algorithm that learns to shift the NE of the game to meet the total load constraints by adjusting the pricing coefficients in an online manner and proves that the algorithm guarantees convergence in L2 to a NE that meets targettotal load constraints.
Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions
• Computer Science
ArXiv
• 2021
This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.
One for All and All for One: Distributed Learning of Fair Allocations With Multi-Player Bandits
• Computer Science
IEEE Journal on Selected Areas in Information Theory
• 2021
Two distributed algorithms are proposed which learn fair matchings between players and arms while minimizing the regret and show that the first algorithm learns a matching where all players obtain an expected reward of at least their QoS with constant regret.

References

SHOWING 1-10 OF 41 REFERENCES
Distributed Multi-Player Bandits - a Game of Thrones Approach
• Computer Science
NeurIPS
• 2018
This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).
Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access
• Computer Science
ArXiv
• 2019
This paper considers a model where there is no central control and the users cannot communicate with each other, and presents a policy that achieves expected regret of order $O(\log^{2+\delta}{T})$ for some $\delta > 0$.
Social Learning in Multi Agent Multi Armed Bandits
• Computer Science
Proc. ACM Meas. Anal. Comput. Syst.
• 2019
A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.
Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks
• Computer Science
IEEE Journal on Selected Areas in Communications
• 2019
A multi-armed bandit based distributed algorithm for static networks and extend it for the dynamic networks to achieve stable orthogonal allocation (SOC) in finite time with two novel characteristics: low complexity narrowband radio compared to wideband radio in existing works, and Epoch-less approach for dynamic networks.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
• Computer Science, Mathematics
NeurIPS
• 2019
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret
The True Sample Complexity of Identifying Good Arms
• Computer Science
AISTATS
• 2020
We consider two multi-armed bandit problems with $n$ arms: (i) given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and (ii) given a threshold $\mu_0$ and
Competing Bandits in Matching Markets
• Computer Science, Economics
AISTATS
• 2020
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
• Computer Science, Mathematics
COLT
• 2020
The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.
Probability Inequalities for sums of Bounded Random Variables
Abstract Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S