# Near-Optimal Bounds for Testing Histogram Distributions

@article{Canonne2022NearOptimalBF, title={Near-Optimal Bounds for Testing Histogram Distributions}, author={Cl{\'e}ment L. Canonne and Ilias Diakonikolas and Daniel M. Kane and Sihan Liu}, journal={ArXiv}, year={2022}, volume={abs/2207.06596} }

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a speciﬁed number of bins. One of the most common tools for the succinct approximation of data, k -histograms over [ n ] , are probability distributions that are piecewise constant over a set of k intervals. The histogram testing problem is the following: Given samples from an unknown distribution p on [ n ] , we want to distinguish between the cases that p is a k…

## References

SHOWING 1-10 OF 51 REFERENCES

### Near-Optimal Closeness Testing of Discrete Histogram Distributions

- Computer Science, MathematicsICALP
- 2017

A new algorithm for testing the equivalence between two discrete histograms and a nearly matching information-theoretic lower bound are investigated, improving on previous work by polynomial factors in the relevant parameters.

### Are Few Bins Enough: Testing Histogram Distributions

- Mathematics, Computer SciencePODS
- 2015

A sample and time-efficient algorithm for this problem is obtained, complemented by a nearly-matching information-theoretic lower bound on the number of samples required for this task.

### Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

- Computer SciencePODS
- 2015

This work designs a fast and near-optimal algorithm for approximating arbitrary one dimensional data distributions by histograms that uses the information-theoretically minimal sample size of m, runs in sample-linear time O(m), and outputs an O(k)-histogram whose l2-distance from p is at most O(optk) +ε, where optk is the minimum l2 distance between p and any k-histogram.

### Testing for Families of Distributions via the Fourier Transform

- MathematicsNeurIPS
- 2018

This work applies its Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables, Poisson Multinomial Distributions, and Discrete Log-Concave Distributions.

### Optimal Algorithms for Testing Closeness of Discrete Distributions

- Computer Science, MathematicsSODA
- 2014

This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

### Testing Shape Restrictions of Discrete Distributions

- Mathematics, Computer ScienceTheory of Computing Systems
- 2017

A general algorithm is developed that applies to a large range of “shape-constrained” properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions, and is computationally efficient.

### Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

- Computer Science2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

This work designs a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown univariate distributions under the Ak-distance metric, and yields new, simple L1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions.

### A New Approach for Testing Properties of Discrete Distributions

- Computer Science, Mathematics2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

### Optimal Histograms with Quality Guarantees

- Computer ScienceVLDB
- 1998

Algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics and an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates are presented.

### Approximation and streaming algorithms for histogram construction problems

- Computer ScienceTODS
- 2006

The first linear time (1+ε)-factor approximation algorithms (for any ε > 0) are given for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc.