Near-Optimal Bounds for Testing Histogram Distributions

  title={Near-Optimal Bounds for Testing Histogram Distributions},
  author={Cl{\'e}ment L. Canonne and Ilias Diakonikolas and Daniel M. Kane and Sihan Liu},
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, k -histograms over [ n ] , are probability distributions that are piecewise constant over a set of k intervals. The histogram testing problem is the following: Given samples from an unknown distribution p on [ n ] , we want to distinguish between the cases that p is a k… 

Figures from this paper



Near-Optimal Closeness Testing of Discrete Histogram Distributions

A new algorithm for testing the equivalence between two discrete histograms and a nearly matching information-theoretic lower bound are investigated, improving on previous work by polynomial factors in the relevant parameters.

Are Few Bins Enough: Testing Histogram Distributions

A sample and time-efficient algorithm for this problem is obtained, complemented by a nearly-matching information-theoretic lower bound on the number of samples required for this task.

Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

This work designs a fast and near-optimal algorithm for approximating arbitrary one dimensional data distributions by histograms that uses the information-theoretically minimal sample size of m, runs in sample-linear time O(m), and outputs an O(k)-histogram whose l2-distance from p is at most O(optk) +ε, where optk is the minimum l2 distance between p and any k-histogram.

Testing for Families of Distributions via the Fourier Transform

This work applies its Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables, Poisson Multinomial Distributions, and Discrete Log-Concave Distributions.

Optimal Algorithms for Testing Closeness of Discrete Distributions

This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

Testing Shape Restrictions of Discrete Distributions

A general algorithm is developed that applies to a large range of “shape-constrained” properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions, and is computationally efficient.

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

This work designs a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown univariate distributions under the Ak-distance metric, and yields new, simple L1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions.

A New Approach for Testing Properties of Discrete Distributions

  • Ilias DiakonikolasD. Kane
  • Computer Science, Mathematics
    2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
  • 2016
The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances.

Optimal Histograms with Quality Guarantees

Algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics and an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates are presented.

Approximation and streaming algorithms for histogram construction problems

The first linear time (1+ε)-factor approximation algorithms (for any ε > 0) are given for a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data, workloads, etc.