# Cooperative Multi-Agent Bandits with Heavy Tails

@inproceedings{Dubey2020CooperativeMB, title={Cooperative Multi-Agent Bandits with Heavy Tails}, author={Abhimanyu Dubey and Alex 'Sandy' Pentland}, booktitle={ICML}, year={2020} }

We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not lend itself to robust estimation for heavy-tailed settings. We propose \textsc{MP-UCB}, a…

## Figures from this paper

## 19 Citations

Distributed Bandits with Heterogeneous Agents

- Computer ScienceIEEE INFOCOM 2022 - IEEE Conference on Computer Communications
- 2022

This paper proposes two learning algorithms, CO-UCB and CO-AAE, and proves that both algorithms achieve order-optimal regret, which is O(log T), where O is the minimum suboptimality gap between the reward mean of arm i and any local optimal arm.

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

- Computer ScienceNeurIPS
- 2021

AAE-LCB is proposed, a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy, and switches to other arms only if all local arms are dominated by some external arms.

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

- Computer ScienceArXiv
- 2021

This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

- Computer ScienceAISTATS
- 2021

An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.

When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits

- Computer ScienceArXiv
- 2021

ComEx is proposed, a novel cost-effective communication protocol in which the group achieves the same order of performance as full communication while communicating only O(log T ) number of messages.

Asymptotic Optimality for Decentralised Bandits

- Computer ScienceACM SIGMETRICS Performance Evaluation Review
- 2022

An algorithm which improves upon the Gossip- Insert-Eliminate method of Chawla et al. is presented and empirical results demonstrating lower regret on simulated data are presented.

Bayesian Algorithms for Decentralized Stochastic Bandits

- Computer ScienceIEEE Journal on Selected Areas in Information Theory
- 2021

A decentralized Thompson Sampling (TS) algorithm and a decentralized Bayes-UCB algorithm are proposed and it is shown that the proposed decentralized TS can be extended to general bandit problems, where posterior distribution cannot be computed in closed form.

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

- Computer ScienceNeurIPS
- 2021

This paper proposes decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret, and presents an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies.

Differentially-Private Federated Linear Bandits

- Computer ScienceNeurIPS
- 2020

This paper devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning, which provides competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

- Computer ScienceArXiv
- 2021

It is shown that group performance, as measured by the bound on regret, can be signiﬁcantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its γ -hop neighbors.

## References

SHOWING 1-10 OF 61 REFERENCES

Decentralized Cooperative Stochastic Multi-armed Bandits

- Computer ScienceArXiv
- 2018

This work designs a fully decentralized algorithm that uses a running consensus procedure to compute, with some delay, accurate estimations of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound algorithm that accounts for the delay and error of the estimations.

Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms

- Computer Science2016 IEEE 55th Conference on Decision and Control (CDC)
- 2016

This work rigorously characterize the influence of the communication graph structure on the decision-making performance of the group and proves the performance of state-of-the-art frequentist and Bayesian algorithms for cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph.

Delay and Cooperation in Nonstochastic Bandits

- Computer ScienceCOLT
- 2016

This work introduces EXP3-COOP, a cooperative version of the EXP3 algorithm, and proves that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K d (T lnK), where d is the independence number of the d-th power of the communication graphG.

Multi-armed bandits in multi-agent networks

- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

This paper addresses the multi-armed bandit problem in a multi-player framework with a distributed variant of the well-known UCB1 algorithm that is optimal in the sense that in a complete network it scales down the regret of its single-player counterpart by the network size.

Decentralized multi-armed bandit with multiple distributed players

- Computer Science2010 Information Theory and Applications Workshop (ITA)
- 2010

It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

- Computer Science, MathematicsNeurIPS
- 2019

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret…

Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information

- Computer Science2018 IEEE Conference on Decision and Control (CDC)
- 2018

A novel policy based on partitions of the communication graph is developed and a distributed method for selecting an arbitrary number of leaders and partitions is proposed and evaluated using Monte-Carlo simulations.

Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs

- Computer Science, EconomicsUAI
- 2018

This paper derives theoretical guarantees for the proposed two bandit algorithms, and demonstrates the effectiveness of two algorithms in pure exploration of MAB with heavy-tailed payoffs in synthetic data and real-world financial data.

Distributed Learning in Multi-Armed Bandit With Multiple Players

- Computer ScienceIEEE Transactions on Signal Processing
- 2010

It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

- Computer Science, MathematicsCOLT
- 2020

The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.