Optimal testing of discrete distributions with high probability

  title={Optimal testing of discrete distributions with high probability},
  author={Ilias Diakonikolas and Themis Gouleakis and Daniel M. Kane and John Peebles and Eric Price},
  journal={Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing},
We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property P, and parameters 0< є, δ <1, we want to distinguish with probability at least 1−δ whether these distributions satisfy P or are є-far from P in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to δ = Ω(1)), and provided sample-optimal testers for a… Expand
The Price of Tolerance in Distribution Testing
We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution p over {1, . . . , n}, is it ε1-close to or ε2-far from a reference distribution q (inExpand
The Sample Complexity of Robust Covariance Testing
The main result is that the sample complexity of covariance testing dramatically increases in the contaminated setting, and a sample complexity lower bound of Ω(d) for ǫ an arbitrarily small constant and γ = 1/2 is proved. Expand
Near-optimal learning of tree-structured distributions by Chow-Liu
The upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): it is proved that for three random variables X,Y,Z each over Σ, testing if I(X; Y ∣ Z) is 0 or ≥ ε is possible with O(|Σ|3/ε) samples. Expand


A New Approach for Testing Properties of Discrete Distributions
  • Ilias Diakonikolas, D. Kane
  • Computer Science, Mathematics
  • 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand
Optimal Algorithms for Testing Closeness of Discrete Distributions
This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}. Expand
Near-Optimal Closeness Testing of Discrete Histogram Distributions
A new algorithm for testing the equivalence between two discrete histograms and a nearly matching information-theoretic lower bound are investigated, improving on previous work by polynomial factors in the relevant parameters. Expand
Optimal Testing for Properties of Distributions
This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained. Expand
Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions
This work designs a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown univariate distributions under the Ak-distance metric, and yields new, simple L1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions. Expand
Testing k-Modal Distributions: Optimal Algorithms via Reductions
A new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way and enables us to transform various distribution testing problems for k-modal distributions over {1,..., n} to the corresponding distributionTesting problems for unrestricted distributions over a much smaller domain {1..., e} where e = O(k log n). Expand
Testing Ising Models
It is demonstrated that, in this structured setting, this study of distribution testing on structured multivariate distributions can avoid the curse of dimensionality, obtaining sample and time efficient testers for independence and goodness-of-fit. Expand
Sample-Optimal Identity Testing with High Probability
The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any $n, \ep silon$, and $\delta$. Expand
Testing Conditional Independence of Discrete Distributions
This work studies the problem of testing conditional independence for discrete distributions and develops a general theory providing tight variance bounds for specific estimators of this form, up to constant factors, for all such estimators. Expand
Testing Conditional Independence of Discrete Distributions
The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity for discrete distributions over[n], and a general theory providing tight variance bounds for all such estimators is developed. Expand