Max-Min Grouped Bandits

@article{Wang2021MaxMinGB,
  title={Max-Min Grouped Bandits},
  author={Zhenling Wang and John Scarlett},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.08862}
}
In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number… 

Figures and Tables from this paper

Max-Quantile Grouped Infinite-Arm Bandits

The instance-dependent and worst-case regret are characterized, and a matching lower bound for the latter is provided, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with the instance- dependent upper bounds.

Optimal Clustering with Bandit Feedback

Junwen Yang Institute of Operations Research and Analytics, National University of Singapore, Singapore 117602, junwen yang@u.nus.edu Zixin Zhong Department of Electrical and Computer Engineering,

References

SHOWING 1-10 OF 24 REFERENCES

Multi-Bandit Best Arm Identification

This work proposes an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap), and introduces an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap.

Overlapping Multi-Bandit Best Arm Identification

The number of total arm pulls required for high-probability best-arm identification in every group is bound, and two algorithms for this problem based on successive elimination and lower/upper confidence bounds (LUCB) are presented.

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

Categorized Bandits

Three concepts of ordering between categories, inspired by stochastic dominance between random variables, are introduced, which are gradually weaker so that more and more bandit scenarios satisfy at least one of them.

Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

It is shown that most best-arm algorithms can be described as variants of the two recent optimal algorithms that were proposed that achieve the optimal sample complexity for the problem.

Multi-Armed Bandits with Dependent Arms

Learning algorithms based on the UCB principle are developed which utilize these additional side observations appropriately while performing exploration-exploitation trade-off in the classical multi-armed bandit problem.

Information Complexity in Bandit Subset Selection

This work considers the problem of eciently exploring the arms of a stochastic bandit to identify the best subset of a specied size and derives improved bounds by using KL-divergence-based condence intervals.

Robust Submodular Maximization: A Non-Uniform Partitioning Approach

A new Partitioned Robust (PRo) submodular maximization algorithm that achieves the same guarantee for more general $\tau = o(k)$ and numerically demonstrates the performance of PRo in data summarization and influence maximization.

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.

Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit

The META algorithm is developed, which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations.