Corpus ID: 199551917

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

@article{Lei2019AFS,
  title={A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates},
  author={Zhixian Lei and Kyle Luh and Prayaag Venkat and Fred Zhang},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.04468}
}
We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-definite programming (SDP). Conceptually, it remains open whether convex relaxation is truly necessary… Expand
Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$
TLDR
This work revisits the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence that is intuitively as accurate as the sample mean is for the Gaussian distribution of matching variance. Expand
N ov 2 02 0 Optimal Sub-Gaussian Mean Estimation in R
  • 2020
A spectral algorithm for robust regression with subgaussian rates
TLDR
A new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples, and in the presence of outliers that attains the optimal sub-gaussian error bound even though the data have only finite moments. Expand
Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond
TLDR
This work narrows the gap between the Gaussian and heavy-tailed settings for polynomial-time estimators, introduces new techniques to high-probability estimation, and suggests numerous new algorithmic questions in the following vein. Expand
Multivariate mean estimation with direction-dependent accuracy
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal errorExpand
Robust subgaussian estimation of a mean vector in nearly linear time
We construct an algorithm, running in time $\tilde{\mathcal O}(N d + uK d)$, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson]Expand
Optimal Mean Estimation without a Variance
TLDR
This work studies the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist, and establishes a information-theoretic lower bound on the optimal attainable confidence interval. Expand
Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms
where l(ΣS) is the Gaussian mean width of ΣS and Σ the covariance of the data (in the benchmark i.i.d. Gaussian case). This improves the entropic minimax lower bound from [30] and closes the gapExpand
Heavy-tailed Streaming Statistical Estimation
TLDR
A clipped stochastic gradient descent algorithm is designed and an improved analysis is provided, under a more nuanced condition on the noise of the Stochastic gradients, which is critical when analyzing stochastically optimization problems arising from general statistical estimation problems. Expand
EFFICIENT ESTIMATORS FOR HEAVY-TAILED MACHINE LEARNING
  • 2020
A dramatic improvement in data collection technologies has aided in procuring massive amounts of unstructured and heterogeneous datasets. This has consequently led to a prevalence of heavy-tailedExpand
...
1
2
...

References

SHOWING 1-10 OF 35 REFERENCES
Fast Mean Estimation with Sub-Gaussian Rates
TLDR
Like the polynomial time estimator introduced by Hopkins, 2018, which is based on the sum-of-squares hierarchy, this estimator achieves optimal statistical efficiency in this challenging setting, but it has a significantly faster runtime and a simpler analysis. Expand
Mean estimation with sub-Gaussian rates in polynomial time
TLDR
This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median. Expand
High-Dimensional Robust Mean Estimation in Nearly-Linear Time
TLDR
This work gives the first nearly-linear time algorithms for high-dimensional robust mean estimation on distributions with known covariance and sub-gaussian tails and unknown bounded covariance, and exploits the special structure of the corresponding SDPs to show that they are approximately solvable in nearly- linear time. Expand
Sub-Gaussian Mean Estimation in Polynomial Time
TLDR
This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median. Expand
Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond
TLDR
This work narrows the gap between the Gaussian and heavy-tailed settings for polynomial-time estimators, introduces new techniques to high-probability estimation, and suggests numerous new algorithmic questions in the following vein. Expand
Near-optimal mean estimators with respect to general norms
We study the problem of estimating the mean of a random vector in $$\mathbb {R}^d$$Rd based on an i.i.d. sample, when the accuracy of the estimator is measured by a general norm on $$\mathbbExpand
Robust subgaussian estimation of a mean vector in nearly linear time
We construct an algorithm, running in time $\tilde{\mathcal O}(N d + uK d)$, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson]Expand
Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection
TLDR
QUE-scoring, a new outlier scoring method based on quantum entropy regularization, is evaluated via extensive experiments on synthetic and real data, and it is demonstrated that it often performs better than previously proposed algorithms. Expand
Recent Advances in Algorithmic High-Dimensional Robust Statistics
TLDR
The core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation are introduced and an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks are provided. Expand
Challenging the empirical mean and empirical variance: a deviation study
We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimatorsExpand
...
1
2
3
4
...