Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails

@article{Abdalla2022CovarianceEO,
  title={Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails},
  author={Pedro Abdalla and Nikita Zhivotovskiy},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.08494}
}
We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an η -fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the L p marginal moment with some p > 4 is equivalent to the corresponding L 2 -marginal moment. Despite requiring the existence of only a few moments, our… 
1 Citations
Exact spectral norm error of sample covariance
Gaussian widths over spherical slices of the standardized ellipsoid play the role of a first-order analogue to the zeroth-order characteristic r ( Σ ). As an immediate application of the first-order

References

SHOWING 1-10 OF 67 REFERENCES
Robust multivariate mean estimation: The optimality of trimmed mean
TLDR
A multivariate extension of the trimmed-mean estimator is introduced and its optimal performance under minimal conditions is shown.
Sample covariance matrices of heavy-tailed distributions
Let $p>2$, $B\geq 1$, $N\geq n$ and let $X$ be a centered $n$-dimensional random vector with the identity covariance matrix such that $\sup\limits_{a\in S^{n-1}}{\mathrm E}|\langle X,a\rangle|^p\leq
Dimension-free Bounds for Sums of Independent Matrices and Simple Tensors via the Variational Principle
TLDR
This work considers the deviation inequalities for the sums of independent d by d random matrices, as well as rank one random tensors, and presents the bounds that do not depend explicitly on the dimension d, but rather on the e-ective rank.
Robust covariance estimation under L_4-L_2 norm equivalence
Let $X$ be a centered random vector taking values in $\mathbb{R}^d$ and let $\Sigma= \mathbb{E}(X\otimes X)$ be its covariance matrix. We show that if $X$ satisfies an $L_4-L_2$ norm equivalence,
Robust covariance and scatter matrix estimation under Huber’s contamination model
TLDR
A new concept called matrix depth is defined and a robust covariance matrix estimator is proposed that is shown to achieve minimax optimal rate under Huber's $\epsilon$-contamination model for estimating covariance/scatter matrices with various structures including bandedness and sparsity.
Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles
Let K be an isotropic convex body in Rn. Given e > 0, how many independent points Xi uniformly distributed on K are neededfor the empirical covariance matrix to approximate the identity up to e with
Robust Estimation of Covariance Matrices: Adversarial Contamination and Beyond
TLDR
An estimator is proposed that is adaptive to the potential low-rank structure of the covariance matrix as well as to the proportion of contaminated data, and admits tight deviation guarantees despite rather weak assumptions on the underlying distribution.
Distribution-Free Robust Linear Regression
TLDR
Using the ideas of truncated least squares, median-of-means procedures, and aggregation theory, a non-linear estimator achieving excess risk of order d/n with an optimal sub-exponential tail is constructed.
New bounds for $k$-means and information $k$-means
TLDR
The counterpart of Lloyd's algorithm is described and a new bounded k-means criterion that uses a scale parameter but satisfies a generalization bound that does not require any boundedness or even integrability conditions on the sample is introduced.
...
...