• Corpus ID: 215827512

# Collaborative Top Distribution Identifications with Limited Interaction

@article{Karpov2020CollaborativeTD,
title={Collaborative Top Distribution Identifications with Limited Interaction},
author={Nikolai Karpov and Qin Zhang and Yuanshuo Zhou},
journal={ArXiv},
year={2020},
volume={abs/2004.09454}
}
• Published 20 April 2020
• Computer Science
• ArXiv
We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called {\em top-$m$ arm identifications} in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning…

## Figures and Tables from this paper

• Computer Science
STOC
• 2021
It is shown that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d logd logT) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(logd, loglogT) factors.
• Yihan DuLongbo Huang
• Computer Science
ArXiv
• 2021
In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication
• Computer Science
ArXiv
• 2022
A general multi-agent bandit model in which each agent is facing a set of arms and may communicate with other agents through a central controller in order to identify its optimal arm is introduced, which provides new lower bounds on the sample complexity of pure exploration and on the regret.
• Computer Science
2022 IEEE 61st Conference on Decision and Control (CDC)
• 2022
We introduce a framework for decentralized on-line learning for multi-armed bandits (MAB) with multiple cooperative players, where the reward obtained by the players each round depend on the actions
• Computer Science
ArXiv
• 2021
A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, that naturally fits into the batched feedback setting and achieves theoretically optimal regret rate using only O (log log T ) batches.
• Computer Science
• 2021
A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, which achieves theoretically optimal regret rate using minimal communication and theoretical lower bound implies that Ω(log log T ) batches are necessary for any algorithm to achieve the optimal regret.
• Computer Science
NeurIPS
• 2020
This work proposes algorithms and proves impossibility results which together give almost tight tradeoffs between the total number of arms pulls and the number of policy changes in multi-armed bandits (MAB).

## References

SHOWING 1-10 OF 65 REFERENCES

• Computer Science, Mathematics
AISTATS
• 2017
A novel complexity term is obtained to measure the sample complexity that every Best-$k$-Arm instance requires and an elimination-based algorithm is provided that matches the instance-wise lower bound within doubly-logarithmic factors.
• Computer Science, Mathematics
COLT
• 2016
This work studies both the exact and PAC versions of Best-Basis, and provides algorithms with nearly-optimal sample complexities for these versions of the pure exploration problem subject to a matroid constraint in a stochastic multi-armed bandit game.
• Computer Science
NeurIPS
• 2018
New algorithms for both the realizable and the non-realizable setting are designed, having sample complexity only $O(\ln (k))$ times the worst-case sample complexity for learning a single task.
• Computer Science, Mathematics
COLT
• 2017
The gap-entropy conjecture is made, and for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I))\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)$ samples in expectation.
• Computer Science
2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
• 2019
This paper studies the distributed version of this problem where the authors have multiple agents, and they want to learn the best arm collaboratively, and measures the running time of a distributed algorithm as the speedup over the best centralized algorithm where there is only one agent.
• Computer Science
NeurIPS
• 2018
A collaborative learning algorithm with overhead is obtained, improving the one with overhead in BHPQ17 and it is shown that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class.
• Computer Science, Mathematics
COLT
• 2016
It is proved that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded by $\exp\Big(-T/H)$, where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps.
• Computer Science
ICML
• 2012
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.
• Computer Science
NeurIPS
• 2019
This paper studies both PAC and exact top-\$k arm identification problems and design efficient algorithms considering both round complexity and query complexity, and achieves near optimal query complexity.
• Computer Science
COLT
• 2002
The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.