• Corpus ID: 237304128

Heavy-tailed Streaming Statistical Estimation

  title={Heavy-tailed Streaming Statistical Estimation},
  author={Che-Ping Tsai and Adarsh Prasad and Sivaraman Balakrishnan and Pradeep Ravikumar},
We consider the task of heavy-tailed statistical estimation given streaming p dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional O ( p ) space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from… 

Figures and Tables from this paper

Streaming Algorithms for High-Dimensional Robust Statistics

The main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model, which gives an efficient single-pass streaming algorithm with near-optimal error guarantees and space complexity nearly-linear in the dimension.

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

This work quantifies the convergence rate of the Stochastic Mirror Descent algorithm with a particular class of uniformly convex mirror maps, in terms of the number of iterations, dimensionality and related geometric parameters of the optimization problem.

Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

A variant of NCE is introduced called eNCE which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family.



Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

This work describes sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings and focuses on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni's estimators are also reviewed.

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

The first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise are derived and derive for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise.

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

A simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) is considered and it is proved that it achieves the optimal $O(1/T)$ convergence rate with high probability.

Robust multivariate mean estimation: The optimality of trimmed mean

A multivariate extension of the trimmed-mean estimator is introduced and its optimal performance under minimal conditions is shown.

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

QUE-scoring, a new outlier scoring method based on quantum entropy regularization, is evaluated via extensive experiments on synthetic and real data, and it is demonstrated that it often performs better than previously proposed algorithms.

Robust sub-Gaussian estimation of a mean vector in nearly linear time

The algorithm is fully data-dependent and does not use in its construction the proportion of outliers nor the rate above, which combines recently developed tools for Median-of-Means estimators and covering-Semi-definite Programming.

Tight Analyses for Non-Smooth Stochastic Gradient Descent

It is proved that after $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/T)$ with high probability, and there exists a function from this class for which the errors of the last iterate of deterministic gradient descent is $\Omega(\log (T)/\sqrt{T})$.

Risk minimization by median-of-means tournaments

A new procedure is introduced, the so-called median-of-means tournament, that achieves the optimal tradeoff between accuracy and confidence under minimal assumptions, and in particular outperforms classical methods based on empirical risk minimization.

Geometric median and robust estimation in Banach spaces

In many real-world applications, collected data are contaminated by noise with heavy-tailed distribution and might contain outliers of large magnitude. In this situation, it is necessary to apply

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.