Corpus ID: 209436501

A General Framework for Symmetric Property Estimation

  title={A General Framework for Symmetric Property Estimation},
  author={Moses Charikar and Kirankumar Shiragur and Aaron Sidford},
In this paper we provide a general framework for estimating symmetric properties of distributions from i.i.d. samples. For a broad class of symmetric properties we identify the {\em easy} region where empirical estimation works and the {\em difficult} region where more complex estimators are required. We show that by approximately computing the profile maximum likelihood (PML) distribution \cite{ADOS16} in this difficult region we obtain a symmetric property estimation framework that is sample… Expand

Topics from this paper

Instance Based Approximations to Profile Maximum Likelihood
A new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation, and the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Expand
The Optimality of Profile Maximum Likelihood in Estimating Sorted Discrete Distributions
This paper strengthens the above result and shows that using a careful chaining argument, the error probability can be reduced to $\delta^{1-c}\exp(c'n^{1/3+c})$ for arbitrarily small constants $c>0$ and some constant $c'>0$. Expand
Compressed Maximum Likelihood
This work shows that CML is sample-efficient for several fundamental learning tasks over both discrete and continuous domains, including learning structural densities, estimating probability multisets, and inferring symmetric distribution functionals. Expand
On the Competitive Analysis and High Accuracy Optimality of Profile Maximum Likelihood
This paper strengthens the above result and shows that using a careful chaining argument, the error probability can be reduced to $\delta^{1-c}\cdot \exp(c'n^{1/3+c})$ for arbitrarily small constants $c>0$ and some constant $c'>0$. Expand
On the High Accuracy Limitation of Adaptive Property Estimation
  • Yanjun Han
  • Mathematics, Computer Science
  • 2021
It is shown that under a mild assumption that the distribution estimator is close to the true sorted distribution in expectation, any adaptive approach cannot achieve the optimal sample complexity for every $1$-Lipschitz property within accuracy $\varepsilon \ll n^{-1/3}$. Expand
Minimax Estimation of Divergences Between Discrete Distributions
The first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials is constructed. Expand


Efficient profile maximum likelihood for universal symmetric property estimation
An algorithm is provided that, given n samples from a distribution, computes an approximate PML distribution up to a multiplicative error of exp(n2/3 poly log(n)) in nearly linear time and yields a universal plug-in estimator that is competitive with a broad range of estimators up to accuracy. Expand
A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation
It is proved that for all these properties, a single, simple, plug-in estimator---profile maximum likelihood (PML)---performs as well as the best specialized techniques, raising the possibility that PML may optimally estimate many other symmetric properties. Expand
Data Amplification: A Unified and Competitive Approach to Property Estimation
This work designs the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. Expand
Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
An efficiently computable estimator is constructed that achieves the minimax rates in estimating the distribution up to permutation, and it is shown that the plug-in approach of the authors' unlabeled distribution estimators is "universal" in estimating symmetric functionals of discrete distributions. Expand
Minimax Estimation of Functionals of Discrete Distributions
The minimax rate-optimal mutual information estimator yielded by the framework leads to significant performance boosts over the Chow-Liu algorithm in learning graphical models and the practical advantages of the schemes for the estimation of entropy and mutual information. Expand
Approximate Profile Maximum Likelihood
An efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample, is proposed and the empirical performance is competitive and sometimes significantly better than state-of-the-art performance for various estimation problems. Expand
The Power of Linear Estimators
  • G. Valiant, Paul Valiant
  • Mathematics, Computer Science
  • 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
  • 2011
The main result is that for any property in this broad class of practically relevant distribution properties, there exists a near-optimal linear estimator, and a practical and polynomial-time algorithm for constructing such estimators for any given parameters. Expand
The Broad Optimality of Profile Maximum Likelihood
The profile maximum likelihood estimator is established as the first unified sample-optimal approach to a wide range of learning tasks and achieves the optimal sample complexity up to logarithmic factors of k. Expand
Data Amplification: Instance-Optimal Property Estimation
Novel linear-time-computable estimators are presented that significantly "amplify" the effective amount of data available and outperform the previous state-of-the-art estimators designed for each specific property. Expand
Algorithms for modeling distributions over large alphabets
Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks. Expand