Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond

@article{Achab2017MaxKB,
  title={Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond},
  author={Mastane Achab and St{\'e}phan Cl{\'e}mençon and Aur{\'e}lien Garivier and Anne Sabourin and Claire Vernade},
  journal={ArXiv},
  year={2017},
  volume={abs/1707.08820}
}
This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and next propose an alternative approach, showing that, remarkably, Extreme Bandits can be reduced to a classical version of the bandit problem to a certain extent. Beyond the formal analysis… 
Extreme Bandits using Robust Statistics
TLDR
This work proposes distribution free algorithms using robust statistics and characterize the statistical properties and shows that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms.
Stochastic Linear Bandits with Finitely Many Arms
TLDR
The core idea is to introduce phases of determinisim into the algorithm so that within each phase the actions are chosen independently from the rewards.
The Explore-Then-Commit Algorithm
The focus on subgaussian distributions is mainly for simplicity. Many of the techniques in the chapters that follow can be applied to other stochastic bandits such as those listed in Table 4.1. The
Optimal Design for Least Squares Estimators
  • Mathematics
  • 2020
In the preceeding chapters we introduced the linear bandit and showed how to construct confidence intervals for least squares estimators. We now study the problem of choosing actions for which these
AutoML with Monte Carlo Tree Search
TLDR
A new approach, called Monte Carlo Tree Search for Algorithm Configuration (Mosaic), is presented, fully exploiting the tree structure of the algorithm portfolio and hyper-parameter search space, and experiments show that Mosaic performances match that of Auto-Sklearn.
Partial Monitoring
TLDR
The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 6mk √ n log(k), where m is the number of signals.
Minimax Lower Bounds
Now that we have a good handle on the performance of ERM and its variants, it is time to ask whether we can do better. For example, consider binary classification: we observe n i.i.d. training
Bandit Algorithms
sets of environments and policies respectively and ` : E ×Π→ [0, 1] a bounded loss function. Given a policy π let `(π) = (`(ν1, π), . . . , `(νN , π)) be the loss vector resulting from policy π.
Concentration of Measure
In mathematics, concentration of measure (e.g. about a median) is a principle that is applied in measure theory, probability and combinatorics, and has consequences for other fields such as Banach
...
...

References

SHOWING 1-10 OF 15 REFERENCES
A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
TLDR
The effectiveness of this approach is demonstrated by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).
PAC Lower Bounds and Efficient Algorithms for The Max \(K\)-Armed Bandit Problem
TLDR
This work considers the Max K-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution, and proposes an algorithm that attains this bound up to logarithmic factors.
Bandits With Heavy Tail
TLDR
This paper examines the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, and derives matching lower bounds that show that the best achievable regret deteriorates when ε <; 1.
No Regret Bound for Extreme Bandits
TLDR
A sensible notion of "extreme regret" is defined in the extreme bandit setting, which parallels the concept of regret in the standard banditSetting, and it is proved that no policy can asymptotically achieve no extreme regret.
Simple regret for infinitely many armed bandits
TLDR
This paper proposes an algorithm aiming at minimizing the simple regret, and proves that depending on β, the algorithm is minimax optimal either up to a multiplicative constant orup to a log(n) factor.
The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection
TLDR
An analysis of this Max K-Armed Bandit shows under certain assumptions that the optimal strategy allocates trials to the observed best arm at a rate increasing double exponentially relative to the other arms, which motivates an exploration strategy that follows a Boltzmann distribution with an exponentially decaying temperature parameter.
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Extreme bandits
TLDR
This work proposes the EXTREMEHUNTER algorithm, provides its analysis, and evaluates it empirically on synthetic and real-world experiments to measure the efficiency of an algorithm compared to the oracle policy selecting the source with the heaviest tail.
Tail index estimation, concentration and adaptivity
This paper presents an adaptive version of the Hill estimator based on Lespki's model selection method. This simple data-driven index selection method is shown to satisfy an oracle inequality and is
Adaptive confidence intervals for the tail coefficient in a wide second order class of Pareto models
We study the problem of constructing honest and adaptive confidence intervals for the tail coefficient in the second order Pareto model, when the second order coefficient is unknown. This problem is
...
...