• Corpus ID: 211258581

# My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

@article{Bistritz2020MyFB,
title={My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits},
author={Ilai Bistritz and Tavor Z. Baharav and Amir Leshem and Nicholas Bambos},
journal={ArXiv},
year={2020},
volume={abs/2002.09808}
}
• Published 23 February 2020
• Computer Science
• ArXiv
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between…
15 Citations

## Figures from this paper

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
• Computer Science
NeurIPS
• 2021
BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results suggest that this previously ignored connection is worth further investigation.
Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning
• Computer Science
ArXiv
• 2022
A survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e. the level of preprocessed data, learning models, and intermediate results.
Bandit Learning in Decentralized Matching Markets
• Computer Science, Economics
J. Mach. Learn. Res.
• 2021
This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition and introduces a new algorithm for this setting that attains stable regret when preferences of the arms over players are shared.
Bandit based centralized matching in two-sided markets for peer to peer lending
A technique based on sequential decision making that allows the lenders to adjust their choices based on the dynamics of uncertainty from competition over time is devised and it is found that the lender regret depends on the initial preferences set by the lenders which could affect their learning over decision making steps.
Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions
• Computer Science
ArXiv
• 2021
This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.
Decentralized Dynamic Rate and Channel Selection Over a Shared Spectrum
• Computer Science
IEEE Transactions on Communications
• 2021
This work considers the problem of distributed dynamic rate and channel selection in a multi-user network, in which each user selects a wireless channel and a modulation and coding scheme (corresponds to a transmission rate) in order to maximize the network throughput, and proposes a decentralized learning algorithm that performs almost optimal exploration of the transmission rates.
• Computer Science
ArXiv
• 2021
This paper formulate the dynamic task placement as an online multi-user multi-armed bandit process, and proposes a decentralized epoch based offloading (DEBO) to optimize user rewards which are subjected under network delay and shows that DEBO can deduce the optimal user-server assignment, thereby achieving a close-to-optimal service performance and tight O(log T ) offloading regret.
G T ] 3 0 Ju l 2 02 1 Sequential Blocked Matching
• Economics, Computer Science
• 2021
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks
• Computer Science
ArXiv
• 2021
A fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks which jointly solves both the channel allocation and access scheduling problems and it is proved that the algorithm has an optimal logarithmic regret.
Multi-User Small Base Station Association via Contextual Combinatorial Volatile Bandits
• Computer Science
IEEE Transactions on Communications
• 2021
An online algorithm is proposed which is able to solve the user-SBS association problem in a multi-user and time-varying environment, where the number of users dynamically varies over time and achieves sublinear in time regret.

## References

SHOWING 1-10 OF 41 REFERENCES
Distributed Multi-Player Bandits - a Game of Thrones Approach
• Computer Science
NeurIPS
• 2018
This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).
Multiplayer Bandits: A Trekking Approach
• Computer Science
IEEE Transactions on Automatic Control
• 2022
The trekking approach eliminates the need to estimate the number of players resulting in fewer collisions and improved regret performance compared to state-of-the-art algorithms, and an epoch-less algorithm that eliminates any requirement of time synchronization across the players provided each player can detect the presence of other players on an arm.
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players
• Computer Science
AISTATS
• 2020
A finite-time analysis of this algorithm is presented, giving the first sublinear minimax regret bound for this problem, and it is proved that if the optimal assignment of players to arms is unique, the algorithm attains the optimal regret.
Competing Bandits in Matching Markets
• Computer Science, Economics
AISTATS
• 2020
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
• Computer Science
J. Mach. Learn. Res.
• 2020
This work designs the first Multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary.
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
• Computer Science, Mathematics
COLT
• 2020
The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.
The True Sample Complexity of Identifying Good Arms
• Computer Science
AISTATS
• 2020
We consider two multi-armed bandit problems with $n$ arms: (i) given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and (ii) given a threshold $\mu_0$ and
Distributed Learning and Optimal Assignment in Multiplayer Heterogeneous Networks
• Computer Science
IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
• 2019
This work considers an ad hoc network where multiple users access the same set of channels and develops algorithms that converge to near-optimal allocation with high probability in a small number of rounds and develops an algorithm that gives logarithmic regret.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
• Computer Science, Mathematics
NeurIPS
• 2019
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret
Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks
• Computer Science
IEEE Journal on Selected Areas in Communications
• 2019
A multi-armed bandit based distributed algorithm for static networks and extend it for the dynamic networks to achieve stable orthogonal allocation (SOC) in finite time with two novel characteristics: low complexity narrowband radio compared to wideband radio in existing works, and Epoch-less approach for dynamic networks.