Corpus ID: 204904859

Adaptive Sampling for Estimating Multiple Probability Distributions

@article{Shekhar2019AdaptiveSF,
  title={Adaptive Sampling for Estimating Multiple Probability Distributions},
  author={Shubhanshu Shekhar and Mohammad Ghavamzadeh and Tara Javidi},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.12406}
}
We consider the problem of allocating samples to a finite set of discrete distributions in order to learn them uniformly well in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance. To present a unified treatment of these distances, we first propose a general optimistic tracking algorithm and analyze its sample allocation performance w.r.t.~an oracle. We then instantiate this algorithm for the four distance measures and derive bounds on the… Expand
Active Model Estimation in Markov Decision Processes
TLDR
This paper formalizes the problem of efficient exploration, introduces the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics, and provides its sample complexity analysis. Expand

References

SHOWING 1-10 OF 23 REFERENCES
Active Learning for Accurate Estimation of Linear Models
TLDR
Trace-UCB is presented, an adaptive allocation algorithm that learns the models' noise levels while balancing contexts accordingly across them, and proves bounds for its simple regret in both expectation and high-probability. Expand
Active Learning in Multi-armed Bandits
In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate theExpand
Finite Time Analysis of Stratified Sampling for Monte Carlo
TLDR
This work proposes a strategy that samples the arms according to an upper bound on their standard deviations and compares its estimation quality to an ideal allocation that would know the standard deviations of the strata. Expand
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
TLDR
This paper describes two strategies based on pulling the arms proportionally to an upper-bound on their variance and derive regret bounds for these strategies and shows that the performance of these allocation strategies depends not only on the variances of the arms but also on the full shape of their distribution. Expand
Empirical Bernstein Bounds and Sample-Variance Penalization
TLDR
Improved constants for data dependent and variance sensitive confidence bounds are given, called empirical Bernstein bounds, and extended to hold uniformly over classes of functions whose growth function is polynomial in the sample size n, and sample variance penalization is considered. Expand
Provably Efficient Maximum Entropy Exploration
TLDR
This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves, and provides an efficient algorithm to optimize such intrinsically defined objectives, when given access to a black box planning oracle. Expand
Active Exploration in Markov Decision Processes
TLDR
A novel learning algorithm is introduced to solve the active exploration problem in Markov decision processes showing that active exploration in MDPs may be significantly more difficult than in MAB. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
Distribution estimation consistent in total variation and in two types of information divergence
TLDR
Histogram-based estimators of distribution are presented which, under certain conditions, converge in total variation, in information divergence, and in reversed-order information divergence to the unknown probability distribution. Expand
Learning the distribution with largest mean: two bandit frameworks
TLDR
This paper reviews two different sequential learning tasks that have been considered in the bandit literature; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. Expand
...
1
2
3
...