• Corpus ID: 221089750

Cooperative Multi-Agent Bandits with Heavy Tails

@inproceedings{Dubey2020CooperativeMB,
  title={Cooperative Multi-Agent Bandits with Heavy Tails},
  author={Abhimanyu Dubey and Alex 'Sandy' Pentland},
  booktitle={ICML},
  year={2020}
}
We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not lend itself to robust estimation for heavy-tailed settings. We propose \textsc{MP-UCB}, a… 

Figures from this paper

Distributed Bandits with Heterogeneous Agents
TLDR
This paper proposes two learning algorithms, CO-UCB and CO-AAE, and proves that both algorithms achieve order-optimal regret, which is O(log T), where O is the minimum suboptimality gap between the reward mean of arm i and any local optimal arm.
Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback
TLDR
AAE-LCB is proposed, a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy, and switches to other arms only if all local arms are dominated by some external arms.
Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions
TLDR
This work proposes a new algorithm that not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication.
Multitask Bandit Learning through Heterogeneous Feedback Aggregation
TLDR
An upper confidence bound-based algorithm is developed, RobustAgg ($epsilon), that adaptively aggregates rewards collected by different players and achieves instance-dependent regret guarantees that depend on the amenability of information sharing across players.
When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits
TLDR
ComEx is proposed, a novel cost-effective communication protocol in which the group achieves the same order of performance as full communication while communicating only O(log T ) number of messages.
Asymptotic Optimality for Decentralised Bandits
TLDR
An algorithm which improves upon the Gossip- Insert-Eliminate method of Chawla et al. is presented and empirical results demonstrating lower regret on simulated data are presented.
Bayesian Algorithms for Decentralized Stochastic Bandits
TLDR
A decentralized Thompson Sampling (TS) algorithm and a decentralized Bayes-UCB algorithm are proposed and it is shown that the proposed decentralized TS can be extended to general bandit problems, where posterior distribution cannot be computed in closed form.
One More Step Towards Reality: Cooperative Bandits with Imperfect Communication
TLDR
This paper proposes decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret, and presents an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies.
Differentially-Private Federated Linear Bandits
TLDR
This paper devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning, which provides competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.
Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication
TLDR
It is shown that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its γ -hop neighbors.
...
...

References

SHOWING 1-10 OF 61 REFERENCES
Decentralized Cooperative Stochastic Multi-armed Bandits
TLDR
This work designs a fully decentralized algorithm that uses a running consensus procedure to compute, with some delay, accurate estimations of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound algorithm that accounts for the delay and error of the estimations.
Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms
TLDR
This work rigorously characterize the influence of the communication graph structure on the decision-making performance of the group and proves the performance of state-of-the-art frequentist and Bayesian algorithms for cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph.
Delay and Cooperation in Nonstochastic Bandits
TLDR
This work introduces EXP3-COOP, a cooperative version of the EXP3 algorithm, and proves that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K d (T lnK), where d is the independence number of the d-th power of the communication graphG.
Multi-armed bandits in multi-agent networks
TLDR
This paper addresses the multi-armed bandit problem in a multi-player framework with a distributed variant of the well-known UCB1 algorithm that is optimal in the sense that in a complete network it scales down the regret of its single-player counterpart by the network size.
Decentralized multi-armed bandit with multiple distributed players
  • Keqin Liu, Qing Zhao
  • Computer Science
    2010 Information Theory and Applications Workshop (ITA)
  • 2010
TLDR
It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret
Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information
TLDR
A novel policy based on partitions of the communication graph is developed and a distributed method for selecting an arbitrary number of leaders and partitions is proposed and evaluated using Monte-Carlo simulations.
Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs
TLDR
This paper derives theoretical guarantees for the proposed two bandit algorithms, and demonstrates the effectiveness of two algorithms in pure exploration of MAB with heavy-tailed payoffs in synthetic data and real-world financial data.
Distributed Learning in Multi-Armed Bandit With Multiple Players
TLDR
It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
TLDR
The first $\sqrt{T}$-type regret guarantee for this problem is proved, under the feedback model where collisions are announced to the colliding players, and it is proved that $T-m$ is the number of players.
...
...