• Corpus ID: 235652238

The Price of Tolerance in Distribution Testing

@article{Canonne2022ThePO,
  title={The Price of Tolerance in Distribution Testing},
  author={Cl{\'e}ment L. Canonne and Ayush Jain and Gautam Kamath and Jerry Zheng Li},
  journal={ArXiv},
  year={2022},
  volume={abs/2106.13414}
}
We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution p over {1, . . . , n}, is it ε1-close to or ε2-far from a reference distribution q (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., ε1 = 0) the sample complexity is Θ( √ n), strongly sublinear in the domain size. At the other end of the spectrum, when ε1 = ε2/2, the… 

Figures from this paper

Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

TLDR
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

Verifying the unseen: interactive proofs for label-invariant distribution properties

TLDR
This work shows that the support size, the entropy, and the distance from the uniform distribution, can all be approximately verified via a 2-message interactive proof, where the communication complexity, the verifier’s running time, andThe sample complexity are O(√N).

Mathematical Framework for Online Social Media Regulation

TLDR
This paper mathematically formalizes this framework and utilizes it to construct a data-driven statistical algorithm to regulate the AF from deflecting users’ beliefs over time, along with sample and complexity guarantees, and shows that the algorithm is robust against potential adversarial users.

References

SHOWING 1-10 OF 51 REFERENCES

Optimal testing of discrete distributions with high probability

TLDR
The first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters are provided.

Optimal Algorithms for Testing Closeness of Discrete Distributions

TLDR
This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs

We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear--sample estimators achieving arbitrarily small additive constant error for a class of

Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

TLDR
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

Estimating the unseen: A sublinear-sample canonical estimator of distributions

TLDR
This paper introduces a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size, and settles the longstanding question of the sample complexities of these estimation problems.

Modern challenges in distribution testing

TLDR
The goal of this dissertation is to identify and address several contemporary challenges in distribution testing and make progress in answering the following questions.

A New Approach for Testing Properties of Discrete Distributions

  • Ilias DiakonikolasD. Kane
  • Computer Science, Mathematics
    2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
TLDR
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

Optimal Testing for Properties of Distributions

TLDR
This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained.

Testing that distributions are close

TLDR
A sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small or large.

Sample-Optimal Identity Testing with High Probability

TLDR
The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any $n, \ep silon$, and $\delta$.
...