# Collaborative Top Distribution Identifications with Limited Interaction

@article{Karpov2020CollaborativeTD, title={Collaborative Top Distribution Identifications with Limited Interaction}, author={Nikolai Karpov and Qin Zhang and Yuanshuo Zhou}, journal={ArXiv}, year={2020}, volume={abs/2004.09454} }

We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called {\em top-$m$ arm identifications} in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning…

## 7 Citations

### Linear bandits with limited adaptivity and learning distributional optimal design

- Computer ScienceSTOC
- 2021

It is shown that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d logd logT) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(logd, loglogT) factors.

### Collaborative Pure Exploration in Kernel Bandit

- Computer ScienceArXiv
- 2021

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication…

### Near-Optimal Collaborative Learning in Bandits

- Computer ScienceArXiv
- 2022

A general multi-agent bandit model in which each agent is facing a set of arms and may communicate with other agents through a central controller in order to identify its optimal arm is introduced, which provides new lower bounds on the sample complexity of pure exploration and on the regret.

### Online Learning for Cooperative Multi-Player Multi-Armed Bandits

- Computer Science2022 IEEE 61st Conference on Decision and Control (CDC)
- 2022

We introduce a framework for decentralized on-line learning for multi-armed bandits (MAB) with multiple cooperative players, where the reward obtained by the players each round depend on the actions…

### Batched Lipschitz Bandits

- Computer ScienceArXiv
- 2021

A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, that naturally fits into the batched feedback setting and achieves theoretically optimal regret rate using only O (log log T ) batches.

### Lipschitz Bandits with Batched Feedback

- Computer Science
- 2021

A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, which achieves theoretically optimal regret rate using minimal communication and theoretical lower bound implies that Ω(log log T ) batches are necessary for any algorithm to achieve the optimal regret.

### Batched Coarse Ranking in Multi-Armed Bandits

- Computer ScienceNeurIPS
- 2020

This work proposes algorithms and proves impossibility results which together give almost tight tradeoffs between the total number of arms pulls and the number of policy changes in multi-armed bandits (MAB).

## References

SHOWING 1-10 OF 65 REFERENCES

### Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection

- Computer Science, MathematicsAISTATS
- 2017

A novel complexity term is obtained to measure the sample complexity that every Best-$k$-Arm instance requires and an elimination-based algorithm is provided that matches the instance-wise lower bound within doubly-logarithmic factors.

### Pure Exploration of Multi-armed Bandit Under Matroid Constraints

- Computer Science, MathematicsCOLT
- 2016

This work studies both the exact and PAC versions of Best-Basis, and provides algorithms with nearly-optimal sample complexities for these versions of the pure exploration problem subject to a matroid constraint in a stochastic multi-armed bandit game.

### Improved Algorithms for Collaborative PAC Learning

- Computer ScienceNeurIPS
- 2018

New algorithms for both the realizable and the non-realizable setting are designed, having sample complexity only $O(\ln (k))$ times the worst-case sample complexity for learning a single task.

### Towards Instance Optimal Bounds for Best Arm Identification

- Computer Science, MathematicsCOLT
- 2017

The gap-entropy conjecture is made, and for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I))\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)$ samples in expectation.

### Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

- Computer Science2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
- 2019

This paper studies the distributed version of this problem where the authors have multiple agents, and they want to learn the best arm collaboratively, and measures the running time of a distributed algorithm as the speedup over the best centralized algorithm where there is only one agent.

### Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

- Computer ScienceNeurIPS
- 2018

A collaborative learning algorithm with overhead is obtained, improving the one with overhead in BHPQ17 and it is shown that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class.

### Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

- Computer Science, MathematicsCOLT
- 2016

It is proved that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded by $\exp\Big(-T/H)$, where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps.

### PAC Subset Selection in Stochastic Multi-armed Bandits

- Computer ScienceICML
- 2012

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

### Efficient Pure Exploration in Adaptive Round model

- Computer ScienceNeurIPS
- 2019

This paper studies both PAC and exact top-$k arm identification problems and design efficient algorithms considering both round complexity and query complexity, and achieves near optimal query complexity.

### PAC Bounds for Multi-armed Bandit and Markov Decision Processes

- Computer ScienceCOLT
- 2002

The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.