A Survey on Distribution Testing: Your Data is Big. But is it Blue?

@article{Canonne2015ASO,
  title={A Survey on Distribution Testing: Your Data is Big. But is it Blue?},
  author={Cl{\'e}ment L. Canonne},
  journal={Electron. Colloquium Comput. Complex.},
  year={2015},
  volume={22},
  pages={63}
}
  • C. Canonne
  • Published 15 August 2020
  • Computer Science, Mathematics
  • Electron. Colloquium Comput. Complex.
The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offsprings, distribution testing. This subfield, at the junction of property testing and Statistics, is concerned with studying properties of probability distributions. We cover the current status of distribution testing in several settings, starting with the traditional sampling… Expand
Exploring the Gap between Tolerant and Non-tolerant Distribution Testing
TLDR
This work focuses on the connection of the sample complexities of non-tolerant ("traditional") testing of distributions and tolerant testing thereof, and shows that if a property requires the distributions to be non-concentrated, then it cannot beNon-Tolerantly tested with o( √ n) many samples, where n denotes the universe size. Expand
Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing
TLDR
The main result is the first polylogarithmic-query algorithm for equivalence testing, deciding whether two unknown distributions are equal to or far from each other, an exponential improvement over the previous best upper bound. Expand
Proofs of Proximity for Distribution Testing
TLDR
The main results include showing that MA distribution testers can be quadratically stronger than standard distribution testers, but no stronger than that; in contrast, IP distribution testers are shown to be exponentially stronger than normal distributions, but when restricted to public coins they can be at best quadratic stronger. Expand
Testing Distributions of Huge Objects
TLDR
A study of a new model of property testing that is a hybrid of testing properties of distributions andTesting properties of strings, where the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings. Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
TLDR
Any testing algorithm for equivalence must make Ω (√ log logn ) queries in the conditional sampling model, showing an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity nΘ(1). Expand
A New Approach for Testing Properties of Discrete Distributions
  • Ilias Diakonikolas, D. Kane
  • Computer Science, Mathematics
  • 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
TLDR
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand
Collision-based Testers are Optimal for Uniformity and Closeness
TLDR
This work shows that the original collision-based testers proposed for uniformity testing of a discrete distribution and closeness testing between two discrete distributions with bounded $\ell_2$-norm are sample-optimal, up to constant factors. Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
TLDR
It is shown that any testing algorithm for equivalence must make $\Omega(\sqrt{\log\log n})$ queries in the conditional sampling model, demonstrating a gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity $n^{\Theta(1)}$). Expand
Tolerant Distribution Testing in the Conditional Sampling Model
TLDR
It is proved that tolerant uniformity testing in the conditional sampling model can be solved using $\tilde{O}(\varepsilon^{-2})$ queries, which is known to be optimal and improves upon the $O$-query algorithm of [CRS15]. Expand
An Adaptivity Hierarchy Theorem for Property Testing ( Fifty Shades of Adaptivity )
Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of adaptive testing algorithms, whereinExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 143 REFERENCES
Testing Properties of Collections of Distributions
TLDR
This work proposes a framework for studying property testing of collections of distributions, where the number of distributions in the collection is a parameter of the problem, and suggests two models that differ in the way the algorithm is given access to samples from the distributions. Expand
A New Approach for Testing Properties of Discrete Distributions
  • Ilias Diakonikolas, D. Kane
  • Computer Science, Mathematics
  • 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
TLDR
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand
Property Testing - Current Research and Surveys
TLDR
Algorithmic Aspects of Property Testing in the Dense Graphs Model and some Recent Results on Local Testing of Sparse Linear Codes are presented. Expand
Collision-based Testers are Optimal for Uniformity and Closeness
TLDR
This work shows that the original collision-based testers proposed for uniformity testing of a discrete distribution and closeness testing between two discrete distributions with bounded $\ell_2$-norm are sample-optimal, up to constant factors. Expand
Testing equivalence between distributions using conditional samples
TLDR
This paper focuses on algorithms for two fundamental distribution testing problems: testing whether D = D* for an explicitly provided D*, and testing whether two unknown distributions D1 and D2 are equivalent, and gives an algorithm whose complexity is poly(log N, 1/e). Expand
A Chasm Between Identity and Equivalence Testing with Conditional Queries
TLDR
It is shown that any testing algorithm for equivalence must make $\Omega(\sqrt{\log\log n})$ queries in the conditional sampling model, demonstrating a gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity $n^{\Theta(1)}$). Expand
Algorithmic and Analysis Techniques in Property Testing
  • D. Ron
  • Computer Science
  • Found. Trends Theor. Comput. Sci.
  • 2009
TLDR
This monograph surveys results in property testing, where the emphasis is on common analysis and algorithmic techniques. Expand
A Survey of Quantum Property Testing
TLDR
This survey describes recent results obtained for quantum property testing and surveys known bounds on testing various natural properties, such as whether two states are equal, whether a state is separable, whether two operations commute, etc. Expand
Testing Similar Means
We consider the problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributionsExpand
Testing Monotone Continuous Distributions on High-Dimensional Real Cubes
TLDR
It is shown that if a distribution D on [0, 1]n is monotone, then one can test if D is uniform with the sample complexity O(n/e2), which is optimal up to a polylogarithmic factor. Expand
...
1
2
3
4
5
...