# A Survey on Distribution Testing: Your Data is Big. But is it Blue?

@article{Canonne2015ASO,
title={A Survey on Distribution Testing: Your Data is Big. But is it Blue?},
author={Cl{\'e}ment L. Canonne},
journal={Electron. Colloquium Comput. Complex.},
year={2015},
volume={22},
pages={63}
}
• Clément L. Canonne
• Published 2015
• Computer Science, Mathematics
• Electron. Colloquium Comput. Complex.
The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offsprings, distribution testing. This subfield, at the junction of property testing and Statistics, is concerned with studying properties of probability distributions. We cover the current status of distribution testing in several settings, starting with the traditional sampling… Expand
145 Citations

#### Figures, Tables, and Topics from this paper

Exploring the Gap between Tolerant and Non-tolerant Distribution Testing
• Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen
• Computer Science
• ArXiv
• 2021
This work focuses on the connection of the sample complexities of non-tolerant ("traditional") testing of distributions and tolerant testing thereof, and shows that if a property requires the distributions to be non-concentrated, then it cannot beNon-Tolerantly tested with o( √ n) many samples, where n denotes the universe size. Expand
Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing
• Computer Science, Mathematics
• Electron. Colloquium Comput. Complex.
• 2018
The main result is the first polylogarithmic-query algorithm for equivalence testing, deciding whether two unknown distributions are equal to or far from each other, an exponential improvement over the previous best upper bound. Expand
Proofs of Proximity for Distribution Testing
• Mathematics, Computer Science
• ITCS
• 2018
The main results include showing that MA distribution testers can be quadratically stronger than standard distribution testers, but no stronger than that; in contrast, IP distribution testers are shown to be exponentially stronger than normal distributions, but when restricted to public coins they can be at best quadratic stronger. Expand
Testing Distributions of Huge Objects
• Computer Science
• Electron. Colloquium Comput. Complex.
• 2021
A study of a new model of property testing that is a hybrid of testing properties of distributions andTesting properties of strings, where the distance between distributions is defined as the earth mover’s distance with respect to the relative Hamming distance between strings. Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
• Mathematics, Computer Science
• Theory Comput.
• 2018
Any testing algorithm for equivalence must make Ω (√ log logn ) queries in the conditional sampling model, showing an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity nΘ(1). Expand
A New Approach for Testing Properties of Discrete Distributions
• Computer Science, Mathematics
• 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
• 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand
Collision-based Testers are Optimal for Uniformity and Closeness
• Computer Science, Mathematics
• Electron. Colloquium Comput. Complex.
• 2016
This work shows that the original collision-based testers proposed for uniformity testing of a discrete distribution and closeness testing between two discrete distributions with bounded $\ell_2$-norm are sample-optimal, up to constant factors. Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
• Computer Science, Mathematics
• APPROX-RANDOM
• 2014
It is shown that any testing algorithm for equivalence must make $\Omega(\sqrt{\log\log n})$ queries in the conditional sampling model, demonstrating a gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity $n^{\Theta(1)}$). Expand
Tolerant Distribution Testing in the Conditional Sampling Model
It is proved that tolerant uniformity testing in the conditional sampling model can be solved using $\tilde{O}(\varepsilon^{-2})$ queries, which is known to be optimal and improves upon the $O$-query algorithm of [CRS15]. Expand
An Adaptivity Hierarchy Theorem for Property Testing ( Fifty Shades of Adaptivity )
Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of adaptive testing algorithms, whereinExpand

#### References

SHOWING 1-10 OF 143 REFERENCES
Testing Properties of Collections of Distributions
• Computer Science, Mathematics
• Theory Comput.
• 2010
This work proposes a framework for studying property testing of collections of distributions, where the number of distributions in the collection is a parameter of the problem, and suggests two models that differ in the way the algorithm is given access to samples from the distributions. Expand
A New Approach for Testing Properties of Discrete Distributions
• Computer Science, Mathematics
• 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
• 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand
Property Testing - Current Research and Surveys
Algorithmic Aspects of Property Testing in the Dense Graphs Model and some Recent Results on Local Testing of Sparse Linear Codes are presented. Expand
Collision-based Testers are Optimal for Uniformity and Closeness
• Computer Science, Mathematics
• Electron. Colloquium Comput. Complex.
• 2016
This work shows that the original collision-based testers proposed for uniformity testing of a discrete distribution and closeness testing between two discrete distributions with bounded $\ell_2$-norm are sample-optimal, up to constant factors. Expand
Testing equivalence between distributions using conditional samples
• Computer Science, Mathematics
• SODA
• 2014
This paper focuses on algorithms for two fundamental distribution testing problems: testing whether D = D* for an explicitly provided D*, and testing whether two unknown distributions D1 and D2 are equivalent, and gives an algorithm whose complexity is poly(log N, 1/e). Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
• Computer Science, Mathematics
• APPROX-RANDOM
• 2014
It is shown that any testing algorithm for equivalence must make $\Omega(\sqrt{\log\log n})$ queries in the conditional sampling model, demonstrating a gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity $n^{\Theta(1)}$). Expand
Algorithmic and Analysis Techniques in Property Testing
• D. Ron
• Computer Science
• Found. Trends Theor. Comput. Sci.
• 2009
This monograph surveys results in property testing, where the emphasis is on common analysis and algorithmic techniques. Expand
A Survey of Quantum Property Testing
• Mathematics, Computer Science
• Theory Comput.
• 2016
This survey describes recent results obtained for quantum property testing and surveys known bounds on testing various natural properties, such as whether two states are equal, whether a state is separable, whether two operations commute, etc. Expand
Testing Similar Means
• Computer Science, Mathematics
• SIAM J. Discret. Math.
• 2014
We consider the problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributionsExpand
Testing Monotone Continuous Distributions on High-Dimensional Real Cubes
• Mathematics, Computer Science
• Property Testing
• 2010
It is shown that if a distribution D on [0, 1]n is monotone, then one can test if D is uniform with the sample complexity O(n/e2), which is optimal up to a polylogarithmic factor. Expand