Best Arm Identification under Additive Transfer Bandits

  title={Best Arm Identification under Additive Transfer Bandits},
  author={Ojash Neopane and Aaditya Ramdas and Aarti Singh},
  journal={2021 55th Asilomar Conference on Signals, Systems, and Computers},
We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only pulling source arms. In this paper, we study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We show how our framework covers a range of previously studied pure exploration problems and… 

Max-Quantile Grouped Infinite-Arm Bandits

The instance-dependent and worst-case regret are characterized, and a matching lower bound for the latter is provided, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with the instance- dependent upper bounds.



Best Arm Identification in Generalized Linear Bandits

Structured Best Arm Identification with Fixed Confidence

This paper introduces an abstract setting to clearly describe the essential properties of the minimax game search problem, and introduces a new algorithm (LUCB-micro) for the abstract setting, and gives its lower and upper sample complexity results.

Pure Exploration of Multi-armed Bandit Under Matroid Constraints

This work studies both the exact and PAC versions of Best-Basis, and provides algorithms with nearly-optimal sample complexities for these versions of the pure exploration problem subject to a matroid constraint in a stochastic multi-armed bandit game.

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.

Exploiting Correlation in Finite-Armed Structured Bandits

We consider a correlated multi-armed bandit problem in which rewards of arms are correlated through a hidden parameter. Our approach exploits the correlation among arms to identify some arms as

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

Combinatorial Pure Exploration of Multi-Armed Bandits

This paper presents general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings and establishes a general problem-dependent lower bound for the CPE problem.

Information Complexity in Bandit Subset Selection

This work considers the problem of eciently exploring the arms of a stochastic bandit to identify the best subset of a specied size and derives improved bounds by using KL-divergence-based condence intervals.

Pure Exploration in Multi-armed Bandits Problems

The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

Monte-Carlo Tree Search by Best Arm Identification

New algorithms for trees of arbitrary depth are developed, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root, to prove new sample complexity guarantees with a refined dependence on the problem instance.