Robust and efficient mean estimation: an approach based on the properties of self-normalized sums

  title={Robust and efficient mean estimation: an approach based on the properties of self-normalized sums},
  author={Stanislav Minsker and Mohamed Ndaoud},
  journal={Electronic Journal of Statistics},
Let $X$ be a random variable with unknown mean and finite variance. We present a new estimator of the mean of $X$ that is robust with respect to the possible presence of outliers in the sample, provides tight sub-Gaussian deviation guarantees without any additional assumptions on the shape or tails of the distribution, and moreover is asymptotically efficient. This is the first estimator that provably combines all these qualities in one package. Our construction is inspired by robustness… 

Figures from this paper

U-statistics of growing order and sub-Gaussian mean estimators with sharp constants
: This paper addresses the following question: given a sample of i.i.d. random variables with finite variance, can one construct an estimator of the unknown mean that performs nearly as well as if the
Multivariate mean estimation with direction-dependent accuracy
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal error
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm
A lower bound on the regret is proved indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tails bandits, and a UCB-type algorithm is designed that leverages Huber’s estimator for robust mean estimation.


This paper proposes a slightly more complicated construction of robust M-estimation and shows that this strategy can be used when the data are only assumed to be mixing, and applies this general approach to least- squares density estimation.
Uniform bounds for robust mean estimators
The main contribution of the paper is the proof of uniform bounds for the deviations of the stochastic process defined by proposed estimators of the mean.
Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
This work introduces a criterion, resilience, which allows properties of a dataset to be robustly computed, even in the presence of a large fraction of arbitrary additional data, and provides new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments.
Challenging the empirical mean and empirical variance: a deviation study
We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimators
Mean estimation with sub-Gaussian rates in polynomial time
This work offers the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions, based on a new semidefinite programming relaxation of a high-dimensional median.
All-In-One Robust Estimator of the Gaussian Mean.
It is shown that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties and can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.
Distributed Statistical Estimation and Rates of Convergence in Normal Approximation
It is shown that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems, and connections between performance of these distributed algorithms and the rates of convergence in normal approximation are established.
Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey
This work describes sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings and focuses on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni's estimators are also reviewed.
CLT For U-statistics With Growing Dimension
The purpose of this paper is to present a general triangular array Central Limit Theorem for U -statistics, where the kernel hk(x1, . . . , xk) and its dimension k may increase with the sample size.
Sub-Gaussian mean estimators
Estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions are defined and various impossibility results for mean estimators are proved.