# Distributed Bandits: Probabilistic Communication on d-regular Graphs

@article{Madhushani2021DistributedBP, title={Distributed Bandits: Probabilistic Communication on d-regular Graphs}, author={Udari Madhushani and Naomi Ehrich Leonard}, journal={2021 European Control Conference (ECC)}, year={2021}, pages={830-835} }

We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a d-regular graph. Every edge in the graph has probabilistic weight p to account for the (1 − p) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability p. We…

## One Citation

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

- Mathematics, Computer ScienceArXiv
- 2021

This paper proposes decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret, and presents an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies.

## References

SHOWING 1-10 OF 35 REFERENCES

Decentralized Cooperative Stochastic Bandits

- Computer ScienceNeurIPS
- 2019

A fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound (UCB) algorithm that accounts for the delay and error of the estimates.

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

- Computer Science, Mathematics2019 18th European Control Conference (ECC)
- 2019

An algorithm is designed for each agent to maximize its own expected cumulative reward and performance bounds that depend on the sociability of the agents and the network structure are proved.

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

- Mathematics, Computer Science2020 European Control Conference (ECC)
- 2020

A sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret is designed.

Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information

- Computer Science2018 IEEE Conference on Decision and Control (CDC)
- 2018

A novel policy based on partitions of the communication graph is developed and a distributed method for selecting an arbitrary number of leaders and partitions is proposed and evaluated using Monte-Carlo simulations.

Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits

- Computer ScienceIJCAI
- 2017

An algorithm for the decentralized setting that uses a value-ofinformation based communication strategy and an exploration-exploitation strategy based on the centralized algorithm is introduced, and it is shown experimentally that it converges rapidly to the performance of the centralized method.

Collaborative learning of stochastic bandits over a social network

- Computer Science, Mathematics2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2016

A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret.

Optimal Algorithms for Multiplayer Multi-Armed Bandits

- Computer ScienceAISTATS
- 2020

DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm for Multiplayer Multi-Armed Bandit.

Decentralized Exploration in Multi-Armed Bandits

- Computer Science, MathematicsICML
- 2019

A generic algorithm Decentralized Elimination is provided, which uses any best arm identification algorithm as a subroutine, and it is proved that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of thebest arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players.

Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards

- Mathematics
- 1987

At each instant of time we are required to sample a fixed number m \geq 1 out of N i.i.d, processes whose distributions belong to a family suitably parameterized by a real number \theta . The…

Algorithms for Differentially Private Multi-Armed Bandits

- Computer Science, MathematicsAAAI
- 2016

This work shows that there exist differentially private variants of Upper Confidence Bound algorithms which have optimal regret, and substantially improves the bounds of previous family of algorithms which use a continual release mechanism.