Ananda Theertha Suresh

Learn More
It was recently shown that estimating the Shannon entropy H(p) of a discrete k-symbol distribution p requires Θ(k/ log k) samples, a number that grows near-linearly in the support size. In many applications H(p) can be replaced by the more general Rényi entropy of order α, Hα(p). We determine the number of samples needed to estimate Hα(p) for all α, showing(More)
Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any k d-dimensional spherical Gaussians,(More)
We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error. We call this technique Orthogonal Random Features (ORF), and provide theoretical and empirical justification for this(More)
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method is known to estimate all distributions well. For example, add-constant estimators are nearly min-max optimal but often perform poorly in practice, and practical estimators such as absolute discounting, Jelinek-Mercer, and Good-Turing are not known to be near(More)
One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss(More)
It was shown recently that estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">$H(p)$ </tex-math></inline-formula> of a discrete <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-symbol distribution <inline-formula> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> requires <inline-formula>(More)
We consider the problems of sorting and maximum-selection of n elements using adversarial comparators. We derive a maximum-selection algorithm that uses 8n comparisons in expectation, and a sorting algorithm that uses 4n log<sub>2</sub> n comparisons in expectation. Both are tight up to a constant factor. Our adversarial-comparator model was motivated by(More)
Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42-58], uses n samples to predict the number U of hitherto unseen species that would be observed if [Formula: see text] new samples were(More)
Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation. Unlike previous works, we make no probabilistic assumptions on the data. We first show that for d dimensional data with n clients, a naive stochastic rounding approach yields a(More)
The advent of data science has spurred interest in estimating properties of distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most attention, with each property estimated using a different technique and often intricate analysis tools. We prove that for(More)