• Corpus ID: 211258581

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

@article{Bistritz2020MyFB,
  title={My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits},
  author={Ilai Bistritz and Tavor Z. Baharav and Amir Leshem and Nicholas Bambos},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.09808}
}
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between… 

Figures from this paper

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
TLDR
BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results suggest that this previously ignored connection is worth further investigation.
Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning
TLDR
A survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e. the level of preprocessed data, learning models, and intermediate results.
Bandit Learning in Decentralized Matching Markets
TLDR
This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition and introduces a new algorithm for this setting that attains stable regret when preferences of the arms over players are shared.
Bandit based centralized matching in two-sided markets for peer to peer lending
TLDR
A technique based on sequential decision making that allows the lenders to adjust their choices based on the dynamics of uncertainty from competition over time is devised and it is found that the lender regret depends on the initial preferences set by the lenders which could affect their learning over decision making steps.
Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions
TLDR
This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.
Decentralized Dynamic Rate and Channel Selection Over a Shared Spectrum
TLDR
This work considers the problem of distributed dynamic rate and channel selection in a multi-user network, in which each user selects a wireless channel and a modulation and coding scheme (corresponds to a transmission rate) in order to maximize the network throughput, and proposes a decentralized learning algorithm that performs almost optimal exploration of the transmission rates.
Decentralized Task Offloading in Edge Computing: A Multi-User Multi-Armed Bandit Approach
TLDR
This paper formulate the dynamic task placement as an online multi-user multi-armed bandit process, and proposes a decentralized epoch based offloading (DEBO) to optimize user rewards which are subjected under network delay and shows that DEBO can deduce the optimal user-server assignment, thereby achieving a close-to-optimal service performance and tight O(log T ) offloading regret.
G T ] 3 0 Ju l 2 02 1 Sequential Blocked Matching
  • Economics, Computer Science
  • 2021
TLDR
An approximately truthful mechanism based on the Explore-then-Commit paradigm, which achieves logarithmic dynamic approximate regret is designed, and there is a significant improvement if one considers the class of randomised policies.
Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks
TLDR
A fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks which jointly solves both the channel allocation and access scheduling problems and it is proved that the algorithm has an optimal logarithmic regret.
Multi-User Small Base Station Association via Contextual Combinatorial Volatile Bandits
TLDR
An online algorithm is proposed which is able to solve the user-SBS association problem in a multi-user and time-varying environment, where the number of users dynamically varies over time and achieves sublinear in time regret.
...
1
2
...

References

SHOWING 1-10 OF 41 REFERENCES
Distributed Multi-Player Bandits - a Game of Thrones Approach
TLDR
This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).
Multiplayer Bandits: A Trekking Approach
TLDR
The trekking approach eliminates the need to estimate the number of players resulting in fewer collisions and improved regret performance compared to state-of-the-art algorithms, and an epoch-less algorithm that eliminates any requirement of time synchronization across the players provided each player can detect the presence of other players on an arm.
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players
TLDR
A finite-time analysis of this algorithm is presented, giving the first sublinear minimax regret bound for this problem, and it is proved that if the optimal assignment of players to arms is unique, the algorithm attains the optimal regret.
Competing Bandits in Matching Markets
TLDR
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
Multi-Player Bandits: The Adversarial Case
TLDR
This work designs the first Multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary.
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
TLDR
The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.
The True Sample Complexity of Identifying Good Arms
We consider two multi-armed bandit problems with $n$ arms: (i) given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and (ii) given a threshold $\mu_0$ and
Distributed Learning and Optimal Assignment in Multiplayer Heterogeneous Networks
TLDR
This work considers an ad hoc network where multiple users access the same set of channels and develops algorithms that converge to near-optimal allocation with high probability in a small number of rounds and develops an algorithm that gives logarithmic regret.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret
Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks
TLDR
A multi-armed bandit based distributed algorithm for static networks and extend it for the dynamic networks to achieve stable orthogonal allocation (SOC) in finite time with two novel characteristics: low complexity narrowband radio compared to wideband radio in existing works, and Epoch-less approach for dynamic networks.
...
1
2
3
4
5
...