Federated Bandit

@article{Zhu2021FederatedB,
  title={Federated Bandit},
  author={Zhaowei Zhu and Jingxuan Zhu and Ji Liu and Yang Liu},
  journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems},
  year={2021},
  volume={5},
  pages={1 - 29}
}
  • Zhaowei Zhu, J. Zhu, +1 author Y. Liu
  • Published 24 October 2020
  • Computer Science
  • Proceedings of the ACM on Measurement and Analysis of Computing Systems
In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret… Expand
1 Citations

Figures from this paper

Federated Multi-Armed Bandits
TLDR
This paper proposes a general framework of FMAB and then studies two specific federated bandit models, solving the approximate model by proposing Federated Double UCB (Fed2-UCB), which constructs a novel “double UCB” principle accounting for uncertainties from both arm and client sampling. Expand

References

SHOWING 1-10 OF 94 REFERENCES
Private and Byzantine-Proof Cooperative Decision-Making
TLDR
This work provides upper-confidence bound algorithms that obtain optimal regret while being differentially-private and tolerant to byzantine agents, and requires no information about the network of connectivity between agents, making them scalable to large dynamic systems. Expand
Decentralized Cooperative Stochastic Bandits
TLDR
A fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound (UCB) algorithm that accounts for the delay and error of the estimates. Expand
Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
TLDR
An algorithm for the decentralized setting that uses a value-ofinformation based communication strategy and an exploration-exploitation strategy based on the centralized algorithm is introduced, and it is shown experimentally that it converges rapidly to the performance of the centralized method. Expand
Differentially-Private Federated Linear Bandits
TLDR
This paper devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning, which provides competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings. Expand
A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards*
TLDR
It is shown that when all the agents share a homogeneous distribution of each arm reward, the algorithm achieves guaranteed logarithmic regret for all N agents at the order of O((1 + 2ρ2)2 logT/N) when T is large. Expand
Differentially Private Gossip Gradient Descent
TLDR
A differentially private distributed algorithm, called private gossi» gradient descent, is proposed, which enables all N agents to converge to the true model, with a performance comparable to that of conventional centralized algorithms. Expand
Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms
TLDR
This work rigorously characterize the influence of the communication graph structure on the decision-making performance of the group and proves the performance of state-of-the-art frequentist and Bayesian algorithms for cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. Expand
Distributed Learning in Multi-Armed Bandit With Multiple Players
  • K. Liu, Qing Zhao
  • Computer Science, Mathematics
  • IEEE Transactions on Signal Processing
  • 2010
TLDR
It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly. Expand
Gossip-based distributed stochastic bandit algorithms
TLDR
This work shows that the probability of playing a suboptimal arm at a peer in iteration t = Ω(log N) is proportional to 1/(Nt) where N denotes the number of peers participating in the network. Expand
Differentially private, multi-agent multi-armed bandits
TLDR
Two algorithms built upon decentralized Time Division Fair Sharing method and upper confidence bounds are derived, where all decisions are taken based on private statistics, that provide regret guarantees that are almost as good as the non-private, multi-agent algorithm and demonstrate them empirically. Expand
...
1
2
3
4
5
...