Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

  title={Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits},
  author={Alexandra Carpentier and Alessandro Lazaric and Mohammad Ghavamzadeh and R{\'e}mi Munos and Peter Auer and Andr{\'a}s Antos},
  booktitle={International Conference on Algorithmic Learning Theory},
In this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multi-armed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since the distributions are not known in advance, we need to design adaptive sampling strategies to select an arm at each round based on the previous observed samples. We describe two strategies based on pulling the… 

Minimax strategy for Stratified Sampling for Monte Carlo

This work proposes a strategy that samples the arms according to an upper bound on their standard deviations and compares its estimation quality to an ideal allocation that would know the standard deviations of the strata, and provides two pseudo-regret analyses.

UCB with An Optimal Inequality

This inequality explicitly considers the values of each and every past reward into the upper bound expression which drives the UCB method, and shows how it fits into the broader scope of other information theoretic UCB algorithms, but unlike them is free from assumptions about the distribution of the data.

Adaptive strategy for stratified Monte Carlo sampling

This work proposes an UCB-type strategy that samples the arms according to an upper bound on their estimated standard deviations, and provides bounds on the total regret on a proxy of the regret.

Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits

The proposed doubly-adaptive Thompson sampling has superior empirical performance to existing baselines in terms of cumulative regret and statistical power in identifying the best arm.

Bandit Optimization with Upper-Confidence Frank-Wolfe

The Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization is introduced, showing upper bounds on the optimization error of this algorithm over various classes of functions, and discussing the optimality of these results.

Finite Time Analysis of Stratified Sampling for Monte Carlo

This work proposes a strategy that samples the arms according to an upper bound on their standard deviations and compares its estimation quality to an ideal allocation that would know the standard deviations of the strata.

Trading off Rewards and Errors in Multi-Armed Bandits

This paper formalizes this tradeoff and introduces the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy, and demonstrates on real-world educational data that Forcing balance returns useful information about the arms without compromising the overall reward.

Online Multi-Armed Bandits with Adaptive Inference

It is demonstrated that using an adaptive inferential scheme (while still retaining the exploration efficacy of TS) provides clear benefits in online decision making: the proposed DATS algorithm has superior empirical performance to existing baselines (UCB and TS) in terms of regret and sample complexity in identifying the best arm.

Active Learning for Accurate Estimation of Linear Models

Trace-UCB is presented, an adaptive allocation algorithm that learns the models' noise levels while balancing contexts accordingly across them, and proves bounds for its simple regret in both expectation and high-probability.

Adaptive Sampling for Estimating Probability Distributions

The techniques developed in the paper can be easily extended to learn some classes of continuous distributions as well as to the related setting of minimizing the average error (rather than the maximum error) in learning a set of distributions.



Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Active learning in heteroscedastic noise

Adaptive Optimal Allocation in Stratified Sampling Methods

In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of

Faster Rates in Regression via Active Learning

A practical algorithm capable of exploiting the extra flexibility of the active setting and provably improving upon the classical passive techniques is described.

Probability inequalities for sum of bounded random variables

Abstract Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S

On efficient designing of nonlinear experiments

Adaptive designs that optimize the Fisher information associated with a nonlinear experiment are considered. Asymptotic properties of the maximum likelihood estimate and related statistical inference

Multivariate Statistics: A Vector Space Approach

leading results, especially with respect to measures of fit. However, no clue is given to the reader as to how to diagnose the problem and find possible solutions. -In the conjoint analysis

Active Learning with Statistical Models

This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

An Introduction to Probabilistic Modeling

Basic concepts and elementary models discrete probability probability densities Gaus and Poisson convergences additional exercises solutions to additional exercises.