# A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

@article{Lei2019AFS, title={A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates}, author={Zhixian Lei and Kyle Luh and Prayaag Venkat and Fred Zhang}, journal={ArXiv}, year={2019}, volume={abs/1908.04468} }

We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-definite programming (SDP). Conceptually, it remains open whether convex relaxation is truly necessary… Expand

#### Topics from this paper

#### 17 Citations

Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$

- Mathematics, Computer Science
- 2020

This work revisits the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence that is intuitively as accurate as the sample mean is for the Gaussian distribution of matching variance. Expand

A spectral algorithm for robust regression with subgaussian rates

- Mathematics, Computer Science
- ArXiv
- 2020

A new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples, and in the presence of outliers that attains the optimal sub-gaussian error bound even though the data have only finite moments. Expand

Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond

- Mathematics, Computer Science
- STOC
- 2020

This work narrows the gap between the Gaussian and heavy-tailed settings for polynomial-time estimators, introduces new techniques to high-probability estimation, and suggests numerous new algorithmic questions in the following vein. Expand

Multivariate mean estimation with direction-dependent accuracy

- Mathematics
- 2020

We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal error… Expand

Robust subgaussian estimation of a mean vector in nearly linear time

- Mathematics
- 2019

We construct an algorithm, running in time $\tilde{\mathcal O}(N d + uK d)$, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson]… Expand

Optimal Mean Estimation without a Variance

- Mathematics, Computer Science
- ArXiv
- 2020

This work studies the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist, and establishes a information-theoretic lower bound on the optimal attainable confidence interval. Expand

Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms

- Mathematics
- 2021

where l(ΣS) is the Gaussian mean width of ΣS and Σ the covariance of the data (in the benchmark i.i.d. Gaussian case). This improves the entropic minimax lower bound from [30] and closes the gap… Expand

Heavy-tailed Streaming Statistical Estimation

- Computer Science, Mathematics
- ArXiv
- 2021

A clipped stochastic gradient descent algorithm is designed and an improved analysis is provided, under a more nuanced condition on the noise of the Stochastic gradients, which is critical when analyzing stochastically optimization problems arising from general statistical estimation problems. Expand

EFFICIENT ESTIMATORS FOR HEAVY-TAILED MACHINE LEARNING

- 2020

A dramatic improvement in data collection technologies has aided in procuring massive amounts of unstructured and heterogeneous datasets. This has consequently led to a prevalence of heavy-tailed… Expand

#### References

SHOWING 1-10 OF 35 REFERENCES

Fast Mean Estimation with Sub-Gaussian Rates

- Mathematics, Computer Science
- COLT
- 2019

Like the polynomial time estimator introduced by Hopkins, 2018, which is based on the sum-of-squares hierarchy, this estimator achieves optimal statistical efficiency in this challenging setting, but it has a significantly faster runtime and a simpler analysis. Expand

Mean estimation with sub-Gaussian rates in polynomial time

- Mathematics, Computer Science
- 2018

This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median. Expand

High-Dimensional Robust Mean Estimation in Nearly-Linear Time

- Mathematics, Computer Science
- SODA
- 2019

This work gives the first nearly-linear time algorithms for high-dimensional robust mean estimation on distributions with known covariance and sub-gaussian tails and unknown bounded covariance, and exploits the special structure of the corresponding SDPs to show that they are approximately solvable in nearly- linear time. Expand

Sub-Gaussian Mean Estimation in Polynomial Time

- Mathematics, Computer Science
- ArXiv
- 2018

This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median. Expand

Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond

- Mathematics, Computer Science
- STOC
- 2020

This work narrows the gap between the Gaussian and heavy-tailed settings for polynomial-time estimators, introduces new techniques to high-probability estimation, and suggests numerous new algorithmic questions in the following vein. Expand

Near-optimal mean estimators with respect to general norms

- Mathematics
- Probability Theory and Related Fields
- 2019

We study the problem of estimating the mean of a random vector in $$\mathbb {R}^d$$Rd based on an i.i.d. sample, when the accuracy of the estimator is measured by a general norm on $$\mathbb… Expand

Robust subgaussian estimation of a mean vector in nearly linear time

- Mathematics
- 2019

We construct an algorithm, running in time $\tilde{\mathcal O}(N d + uK d)$, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson]… Expand

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

- Computer Science, Mathematics
- NeurIPS
- 2019

QUE-scoring, a new outlier scoring method based on quantum entropy regularization, is evaluated via extensive experiments on synthetic and real data, and it is demonstrated that it often performs better than previously proposed algorithms. Expand

Recent Advances in Algorithmic High-Dimensional Robust Statistics

- Computer Science, Mathematics
- ArXiv
- 2019

The core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation are introduced and an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks are provided. Expand

Challenging the empirical mean and empirical variance: a deviation study

- Mathematics
- 2010

We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimators… Expand