• Corpus ID: 215827512

Collaborative Top Distribution Identifications with Limited Interaction

@article{Karpov2020CollaborativeTD,
  title={Collaborative Top Distribution Identifications with Limited Interaction},
  author={Nikolai Karpov and Qin Zhang and Yuanshuo Zhou},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.09454}
}
We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called {\em top-$m$ arm identifications} in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning… 

Figures and Tables from this paper

Linear bandits with limited adaptivity and learning distributional optimal design

It is shown that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d logd logT) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(logd, loglogT) factors.

Collaborative Pure Exploration in Kernel Bandit

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication

Near-Optimal Collaborative Learning in Bandits

A general multi-agent bandit model in which each agent is facing a set of arms and may communicate with other agents through a central controller in order to identify its optimal arm is introduced, which provides new lower bounds on the sample complexity of pure exploration and on the regret.

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

We introduce a framework for decentralized on-line learning for multi-armed bandits (MAB) with multiple cooperative players, where the reward obtained by the players each round depend on the actions

Batched Lipschitz Bandits

A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, that naturally fits into the batched feedback setting and achieves theoretically optimal regret rate using only O (log log T ) batches.

Lipschitz Bandits with Batched Feedback

A novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), is introduced, which achieves theoretically optimal regret rate using minimal communication and theoretical lower bound implies that Ω(log log T ) batches are necessary for any algorithm to achieve the optimal regret.

Batched Coarse Ranking in Multi-Armed Bandits

This work proposes algorithms and proves impossibility results which together give almost tight tradeoffs between the total number of arms pulls and the number of policy changes in multi-armed bandits (MAB).

References

SHOWING 1-10 OF 65 REFERENCES

Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection

A novel complexity term is obtained to measure the sample complexity that every Best-$k$-Arm instance requires and an elimination-based algorithm is provided that matches the instance-wise lower bound within doubly-logarithmic factors.

Pure Exploration of Multi-armed Bandit Under Matroid Constraints

This work studies both the exact and PAC versions of Best-Basis, and provides algorithms with nearly-optimal sample complexities for these versions of the pure exploration problem subject to a matroid constraint in a stochastic multi-armed bandit game.

Improved Algorithms for Collaborative PAC Learning

New algorithms for both the realizable and the non-realizable setting are designed, having sample complexity only $O(\ln (k))$ times the worst-case sample complexity for learning a single task.

Towards Instance Optimal Bounds for Best Arm Identification

The gap-entropy conjecture is made, and for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I))\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)$ samples in expectation.

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

This paper studies the distributed version of this problem where the authors have multiple agents, and they want to learn the best arm collaboratively, and measures the running time of a distributed algorithm as the speedup over the best centralized algorithm where there is only one agent.

Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

A collaborative learning algorithm with overhead is obtained, improving the one with overhead in BHPQ17 and it is shown that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class.

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

It is proved that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded by $\exp\Big(-T/H)$, where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps.

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

Efficient Pure Exploration in Adaptive Round model

This paper studies both PAC and exact top-$k arm identification problems and design efficient algorithms considering both round complexity and query complexity, and achieves near optimal query complexity.

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.
...