• Corpus ID: 235652238

# The Price of Tolerance in Distribution Testing

@article{Canonne2022ThePO,
title={The Price of Tolerance in Distribution Testing},
author={Cl{\'e}ment L. Canonne and Ayush Jain and Gautam Kamath and Jerry Zheng Li},
journal={ArXiv},
year={2022},
volume={abs/2106.13414}
}
• Published 25 June 2021
• Mathematics, Computer Science
• ArXiv
We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution p over {1, . . . , n}, is it ε1-close to or ε2-far from a reference distribution q (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., ε1 = 0) the sample complexity is Θ( √ n), strongly sublinear in the domain size. At the other end of the spectrum, when ε1 = ε2/2, the…
3 Citations

## Figures from this paper

### Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

• Mathematics, Computer Science
APPROX/RANDOM
• 2022
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

### Verifying the unseen: interactive proofs for label-invariant distribution properties

• Computer Science
STOC
• 2022
This work shows that the support size, the entropy, and the distance from the uniform distribution, can all be approximately verified via a 2-message interactive proof, where the communication complexity, the verifier’s running time, andThe sample complexity are O(√N).

### Mathematical Framework for Online Social Media Regulation

• Computer Science
ArXiv
• 2022
This paper mathematically formalizes this framework and utilizes it to construct a data-driven statistical algorithm to regulate the AF from deﬂecting users’ beliefs over time, along with sample and complexity guarantees, and shows that the algorithm is robust against potential adversarial users.

## References

SHOWING 1-10 OF 51 REFERENCES

### Optimal testing of discrete distributions with high probability

• Mathematics, Computer Science
Electron. Colloquium Comput. Complex.
• 2020
The first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters are provided.

### Optimal Algorithms for Testing Closeness of Discrete Distributions

• Computer Science, Mathematics
SODA
• 2014
This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

### Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs

• Mathematics, Computer Science
STOC '11
• 2011
We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear--sample estimators achieving arbitrarily small additive constant error for a class of

### Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

• Mathematics, Computer Science
APPROX/RANDOM
• 2022
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

### Estimating the unseen: A sublinear-sample canonical estimator of distributions

• Computer Science, Mathematics
Electron. Colloquium Comput. Complex.
• 2010
This paper introduces a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size, and settles the longstanding question of the sample complexities of these estimation problems.

### Modern challenges in distribution testing

The goal of this dissertation is to identify and address several contemporary challenges in distribution testing and make progress in answering the following questions.

### A New Approach for Testing Properties of Discrete Distributions

• Computer Science, Mathematics
2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
• 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

### Optimal Testing for Properties of Distributions

• Mathematics, Computer Science
NIPS
• 2015
This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained.

### Testing that distributions are close

• Computer Science
Proceedings 41st Annual Symposium on Foundations of Computer Science
• 2000
A sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small or large.

### Sample-Optimal Identity Testing with High Probability

• Mathematics, Computer Science
Electron. Colloquium Comput. Complex.
• 2017
The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any$n, \ep silon$, and$\delta\$.