• Corpus ID: 16826415

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

@article{Simchowitz2017TheSU,
  title={The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime},
  author={Max Simchowitz and Kevin G. Jamieson and Benjamin Recht},
  journal={ArXiv},
  year={2017},
  volume={abs/1702.05186}
}
We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to… 

Figures and Tables from this paper

On the complexity of All ε-Best Arms Identification
TLDR
The question introduced by [MJTN20] of identifying all the ε -optimal arms in a stochastic multi-armed bandit with Gaussian rewards is considered and two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least 1 − δ are given.
On the complexity of All $\varepsilon$-Best Arms Identification
TLDR
The question introduced by [MJTN20] of identifying all the ε -optimal arms in a stochastic multi-armed bandit with Gaussian rewards is considered and two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least 1 − δ are given.
Structured Best Arm Identification with Fixed Confidence
TLDR
This paper introduces an abstract setting to clearly describe the essential properties of the minimax game search problem, and introduces a new algorithm (LUCB-micro) for the abstract setting, and gives its lower and upper sample complexity results.
Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm
We design new algorithms for the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\mathcal{V}
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits
TLDR
An algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms is provided, and in addition is computationally efficient for combinatorial classes, e.g. shortest-path, matchings and matroids.
Learning to Actively Learn: A Robust Approach
TLDR
This work proposes a procedure for designing algorithms for specific adaptive data collection tasks like active learning and pure-exploration multi-armed bandits, and performs synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluates the method on tasks derived from real data.
Non-Asymptotic Pure Exploration by Solving Games
TLDR
This work proposes sampling rules based on iterative strategies to estimate and converge to its saddle point, and applies no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure.
Non-Asymptotic Pure Exploration by Solving Games
TLDR
This work proposes sampling rules based on iterative strategies to estimate and converge to its saddle point, and applies no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure.
A Bandit Approach to Multiple Testing with False Discovery Control
TLDR
Inspired by the multi-armed bandit literature, an algorithm that takes as few samples as possible to exceed a target true positive proportion while giving anytime control of the false discovery proportion is provided.
A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
TLDR
A new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance called Exploration-Biased Sampling is proposed, which is not only asymptotically optimal but also proves non-asymptotic bounds occurring with high probability.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models
TLDR
This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.
Eluder Dimension and the Sample Complexity of Optimistic Exploration
TLDR
A regret bound is developed that holds for both classes of algorithms and applies broadly and can be specialized to many model classes and depends on a new notion the authors refer to as the eluder dimension, which measures the degree of dependence among action rewards.
PAC Subset Selection in Stochastic Multi-armed Bandits
TLDR
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
TLDR
This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.
Best-of-K-bandits
TLDR
This paper presents distribution-dependent lower bounds based on a particular construction which force a learner to consider all N-choose-K subsets, and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm.
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
TLDR
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.
Minimax Bounds for Active Learning
TLDR
The achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions are studied using minimax analysis techniques to indicate the conditions under which one can expect significant gains through active learning.
Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection
TLDR
A novel complexity term is obtained to measure the sample complexity that every Best-$k$-Arm instance requires and an elimination-based algorithm is provided that matches the instance-wise lower bound within doubly-logarithmic factors.
Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback
TLDR
An asymptotic regret lower bound for any uniformly efficient algorithm in a new setting where may be less than m is derived, and an algorithm based on upper confidence bounds, KL-CUCB, is proved for single-parameter exponential families and bounded, finitely supported rewards.
On the Optimal Sample Complexity for Best Arm Identification
TLDR
The first lower bound for BEST-1-ARM is obtained that goes beyond the classic Mannor-Tsitsiklis lower bound, by an interesting reduction from Sign to BEST- 1-ARM.
...
...