• Corpus ID: 7964701

Fourier-Based Testing for Families of Distributions

  title={Fourier-Based Testing for Families of Distributions},
  author={Cl{\'e}ment L. Canonne and Ilias Diakonikolas and Alistair Stewart},
  journal={Electron. Colloquium Comput. Complex.},
We study the general problem of testing whether an unknown distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and the case that $\mathbf{P}$ is $\epsilon$-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the prototypical… 

Testing for Families of Distributions via the Fourier Transform

This work applies its Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables, Poisson Multinomial Distributions, and Discrete Log-Concave Distributions.

Sharp Bounds for Generalized Uniformity Testing

This work presents a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound on the sample complexity of generalized uniformity testing.

Testing Conditional Independence of Discrete Distributions

This work studies the problem of testing conditional independence for discrete distributions and develops a general theory providing tight variance bounds for specific estimators of this form, up to constant factors, for all such estimators.

Sample-Optimal Identity Testing with High Probability

The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any $n, \ep silon$, and $\delta$.

Testing Identity of Multidimensional Histograms

An algorithm for hypothesis testing for identity testing for multidimensional histogram distributions with sample complexity O(k/epsilon) that runs in sample-polynomial time and is robust to model misspecification, i.e., succeeds even if q is only promised to be {\em close} to a $k$-histogram.

Private Testing of Distributions via Sample Permutations

The framework of property testing is used to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy, which indicates that differential privacy can be obtained in most regimes of parameters for free.

Modern challenges in distribution testing

The goal of this dissertation is to identify and address several contemporary challenges in distribution testing and make progress in answering the following questions.

Property Testing and Probability Distributions: New Techniques, New Models, and New Goals

Property Testing and Probability Distributions: New Techniques, New Models, and New Goals Clément L. Canonne Recently there has been a lot of glorious hullabaloo about Big Data and how it is going to



Optimal Testing for Properties of Distributions

This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained.

Testing Shape Restrictions of Discrete Distributions

A general algorithm is developed that applies to a large range of “shape-constrained” properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions, and is computationally efficient.

Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

An algorithm for properly learning Poisson binomial distributions and provides a novel structural characterization of PBDs, which allows the corresponding fitting problem to be reduced to a collection of systems of low-degree polynomial inequalities.

Testing Identity of Structured Distributions

A unified approach is presented that yields new, simple testers, with sample complexity that is information-theoretically optimal, for broad classes of structured distributions, including $t-flat distributions, $t$-modal distributions, log-concave distributions, monotone hazard rate (MHR) distributions, and mixtures thereof.

Efficient Robust Proper Learning of Log-concave Distributions

This work gives the first computationally efficient algorithm for the robust proper learning of univariate log-concave distributions, which achieves the information-theoretically optimal sample size, runs in polynomial time, and is robust to model misspecification with nearly-optimal error guarantees.

Learning Poisson Binomial Distributions

This work considers a basic problem in unsupervised learning: learning an unknown Poisson binomial distribution, and gives a highly efficient algorithm which learns to $$\epsilon $$ϵ-accuracy (with respect to the total variation distance) using $$\tilde{O}(1/ \ep silon ^{3})$$O~( 1/ϵ3) samples independent of$$n$$n.

Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

A computationally efficient algorithm is designed that uses $\widetilde{O}(k/\epsilon^2)$ samples, and learns an arbitrary $k$-SIIRV within error $k,$ in total variation distance, and proves a tight lower bound on the size of $\ep silon$-covers for ${\cal S}_{n,k}$, and is the key ingredient in the authors' tight sample complexity lower bound.

A New Approach for Testing Properties of Discrete Distributions

  • Ilias DiakonikolasD. Kane
  • Computer Science, Mathematics
    2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

Testing Poisson Binomial Distributions

The sample complexity of this algorithm improves quadratically upon that of the naive "learn followed by tolerant-test" approach, while instance optimal identity testing [VV14] is not applicable since it is looking to simultaneously test against a whole family of distributions.

Optimal Algorithms for Testing Closeness of Discrete Distributions

This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.