# The Price of Tolerance in Distribution Testing

@article{Canonne2022ThePO, title={The Price of Tolerance in Distribution Testing}, author={Cl{\'e}ment L. Canonne and Ayush Jain and Gautam Kamath and Jerry Zheng Li}, journal={ArXiv}, year={2022}, volume={abs/2106.13414} }

We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution p over {1, . . . , n}, is it ε1-close to or ε2-far from a reference distribution q (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., ε1 = 0) the sample complexity is Θ( √ n), strongly sublinear in the domain size. At the other end of the spectrum, when ε1 = ε2/2, the…

## Figures from this paper

## 3 Citations

### Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

- Mathematics, Computer ScienceAPPROX/RANDOM
- 2022

This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

### Verifying the unseen: interactive proofs for label-invariant distribution properties

- Computer ScienceSTOC
- 2022

This work shows that the support size, the entropy, and the distance from the uniform distribution, can all be approximately verified via a 2-message interactive proof, where the communication complexity, the verifier’s running time, andThe sample complexity are O(√N).

### Mathematical Framework for Online Social Media Regulation

- Computer ScienceArXiv
- 2022

This paper mathematically formalizes this framework and utilizes it to construct a data-driven statistical algorithm to regulate the AF from deﬂecting users’ beliefs over time, along with sample and complexity guarantees, and shows that the algorithm is robust against potential adversarial users.

## References

SHOWING 1-10 OF 51 REFERENCES

### Optimal testing of discrete distributions with high probability

- Mathematics, Computer ScienceElectron. Colloquium Comput. Complex.
- 2020

The first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters are provided.

### Optimal Algorithms for Testing Closeness of Discrete Distributions

- Computer Science, MathematicsSODA
- 2014

This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

### Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs

- Mathematics, Computer ScienceSTOC '11
- 2011

We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear--sample estimators achieving arbitrarily small additive constant error for a class of…

### Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

- Mathematics, Computer ScienceAPPROX/RANDOM
- 2022

This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts, and proves a close to linear lower bound against their tolerant tests.

### Estimating the unseen: A sublinear-sample canonical estimator of distributions

- Computer Science, MathematicsElectron. Colloquium Comput. Complex.
- 2010

This paper introduces a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size, and settles the longstanding question of the sample complexities of these estimation problems.

### Modern challenges in distribution testing

- Computer Science
- 2018

The goal of this dissertation is to identify and address several contemporary challenges in distribution testing and make progress in answering the following questions.

### A New Approach for Testing Properties of Discrete Distributions

- Computer Science, Mathematics2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

### Optimal Testing for Properties of Distributions

- Mathematics, Computer ScienceNIPS
- 2015

This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained.

### Testing that distributions are close

- Computer ScienceProceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

A sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small or large.

### Sample-Optimal Identity Testing with High Probability

- Mathematics, Computer ScienceElectron. Colloquium Comput. Complex.
- 2017

The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any $n, \ep silon$, and $\delta$.