• Corpus ID: 211069175

List Decodable Subspace Recovery

@article{Raghavendra2020ListDS,
  title={List Decodable Subspace Recovery},
  author={Prasad Raghavendra and Morris Yau},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.03004}
}
Learning from data in the presence of outliers is a fundamental problem in statistics. In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery. Given a dataset where an $\alpha$ fraction (less than half) of the data is distributed uniformly in an unknown $k$ dimensional subspace in $d$ dimensions, and with no additional assumptions on the remaining data, the goal is to recover a succinct list of $O(\frac{1}{\alpha… 

List Decodable Mean Estimation in Nearly Linear Time

TLDR
This paper considers robust statistics in the presence of overwhelming outliers where the majority of the dataset is introduced adversarially and develops an algorithm for list decodable mean estimation in the same setting achieving up to constants the information theoretically optimal recovery, optimal sample complexity, and in nearly linear time up to polylogarithmic factors in dimension.

List-Decodable Sparse Mean Estimation

TLDR
The main contribution is the first polynomial-time algorithm that enjoys sample complexity O i.e. poly-logarithmic in the dimension, using low-degree sparse polynomials to connect outliers, which may be of independent interest.

List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time

TLDR
Apoly(1/\alpha) d^{O(1)} time algorithm is given that outputs a list containing a list of candidate covariances that contains a $\hat{\Pi}$ satisfying $\|\hat{Pi} -\Pi-\Pi_*\|_F \leq \eta$ for any arbitrary $\eta > 0$ in $d^{O(\alpha) + \log ( 1/\eta))}$ time.

List-Decodable Mean Estimation via Iterative Multi-Filtering

TLDR
The main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers with information-theoretically near-optimal error.

List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

TLDR
This work develops a novel, conceptually simpler technique for list-decodable mean estimation that achieves the optimal error guarantee of Θ( √ log(1/α) with quasi-polynomial sample and computational complexity and complements the authors' upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.

Learning a mixture of two subspaces over finite fields

TLDR
These algorithms imply computational tractability of the problem of learning mixtures of two subspaces, except in the degenerate setting captured by learning parities with noise.

Statistical Query Lower Bounds for List-Decodable Linear Regression

TLDR
The main result is a Statistical Query (SQ) lower bound of d, which qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.

List-decodable covariance estimation

TLDR
The first polynomial time algorithm for list-decodable covariance estimation is given, and this algorithm works more generally for any distribution D that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials.

Privately Learning Mixtures of Axis-Aligned Gaussians

TLDR
It is proved that Õ(kd log(1/δ)/αε) samples are sufficient to learn a mixture of k axis-aligned Gaussians in R to within total variation distance α while satisfying (ε, δ)-differential privacy, the first result for privately learning mixtures of unbounded axis- aligned (or even unbounded univariate)Gaussians.

Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally

TLDR
This work revisits the problem of estimating the mean and covariance of an unknown d dimensional Gaussian distribution in the presence of an ε -fraction of adversarial outliers and gives a new, simple analysis of the same canonical sum-of-squares relaxation used in Kothari and Steurer (2017) and Bakshi and Kotharis (2020) and shows that their algorithm achieves the same error, sample complexity and running time guarantees.

References

SHOWING 1-10 OF 50 REFERENCES

Algorithms and Hardness for Robust Subspace Recovery

TLDR
It is proved that it is Small Set Expansion hard to find $T$ when the fraction of errors is any larger, thus giving evidence that the estimator is an {\em optimal} compromise between efficiency and robustness.

List-decodable robust mean estimation and learning mixtures of spherical gaussians

TLDR
The problem of list-decodable (robust) Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians are studied and a set of techniques that yield new efficient algorithms with significantly improved guarantees are developed.

Robust Subspace Recovery with Adversarial Outliers

TLDR
This work examines a theoretical estimator that is intractable to calculate and uses it to derive information-theoretic bounds of exact recovery, and proposes two tractable estimators: a variant of RANSAC and a simple relaxation of the theoretical estimators.

List Decodable Learning via Sum of Squares

TLDR
A framework for list-decodable learning via the Sum-of-Squares SDP hierarchy is developed and an algorithm that outputs a list of linear functions such that there exists some $\hat{\ell} \in \mathcal{L}$ that is close to $\ell$.

Learning from untrusted data

TLDR
An algorithm for robust learning in a very general stochastic optimization setting is provided that has immediate implications for robustly estimating the mean of distributions with bounded second moments, robustly learning mixtures of such distributions, and robustly finding planted partitions in random graphs.

List-Decodable Subspace Recovery via Sum-of-Squares

TLDR
A new method is given that allows error reduction "within SoS" with only a logarithmic cost in the exponent in the running time (in contrast to polynomial cost in [KKK'19, RY'20].

Outlier-robust moment-estimation via sum-of-squares

TLDR
Improved algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers are developed and the guarantees of these algorithms match information-theoretic lower-bounds for the class of distributions the authors consider.

A Well-Tempered Landscape for Non-convex Robust Subspace Recovery

TLDR
It is proved that an underlying subspace is the only stationary point and local minimizer in a specified neighborhood under a deterministic condition on a dataset and it is shown that a geodesic gradient descent method over the Grassmannian manifold can exactly recover the underlying sub space when the method is properly initialized.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

TLDR
This work introduces a criterion, resilience, which allows properties of a dataset to be robustly computed, even in the presence of a large fraction of arbitrary additional data, and provides new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments.

Smoothed Analysis in Unsupervised Learning via Decoupling

TLDR
This work obtains high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind and uses these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning.