# Sever: A Robust Meta-Algorithm for Stochastic Optimization

@article{Diakonikolas2019SeverAR, title={Sever: A Robust Meta-Algorithm for Stochastic Optimization}, author={Ilias Diakonikolas and Gautam Kamath and Daniel M. Kane and Jerry Li and Jacob Steinhardt and Alistair Stewart}, journal={ArXiv}, year={2019}, volume={abs/1803.02815} }

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n…

## Figures and Topics from this paper

## 170 Citations

Online Robust Regression via SGD on the l1 loss

- Computer Science, MathematicsNeurIPS
- 2020

It is shown in this work that stochastic gradient descent on the $\ell_1$ loss converges to the true parameter vector at a $\tilde{O}( 1 / (1 - \eta)^2 n )$ rate which is independent of the values of the contaminated measurements.

Efficient Algorithms for Outlier-Robust Regression

- Computer Science, MathematicsCOLT
- 2018

This work gives the first polynomial-time algorithm for performing linear orPolynomial regression resilient to adversarial corruptions in both examples and labels and gives a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting.

Principled approaches to robust machine learning and beyond

- Computer Science
- 2018

This thesis devise two novel, but similarly inspired, algorithmic paradigms for estimation in high dimensions in the presence of a small number of adversarially added data points, both of which are the first efficient algorithms which achieve (nearly) optimal error bounds for a number fundamental statistical tasks such as mean estimation and covariance estimation.

Efficient Algorithms and Lower Bounds for Robust Linear Regression

- Computer Science, MathematicsSODA
- 2019

Any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity, must incur an error of $\Omega(\sqrt{\epsilon} \sigma)$.

Towards More Scalable and Robust Machine Learning

- Computer Science
- 2019

This dissertation studies several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrates recent progress towards the ambitious goal of building more scalable and robust machine learning models.

Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression

- Computer Science, MathematicsCOLT
- 2019

A nearly linear time estimator which consistently estimates the true regression vector, even with $1-o(1)$ fraction of corruptions is provided, based on a novel variant of outlier removal via hard thresholding in which the threshold is chosen adaptively and crucially relies on randomness to escape bad fixed points of the non-convexhard thresholding operation.

Robustness meets algorithms

- Computer ScienceCommun. ACM
- 2021

This work gives the first efficient algorithm for estimating the parameters of a high-dimensional Gaussian that is able to tolerate a constant fraction of corruptions that is independent of the dimension.

Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

- Computer Science, MathematicsArXiv
- 2021

A novel and simpler near-linear time robust mean estimation algorithm in the α → 1 regime, based on a one-shot matrix multiplicative weightsinspired potential decrease, is developed, achieving nearly-optimal statistical guarantees.

Closing the BIG-LID: An Effective Local Intrinsic Dimensionality Defense for Nonlinear Regression Poisoning

- Computer ScienceIJCAI
- 2021

A new analysis of local intrinsic dimensionality (LID) of nonlinear regression under such poisoning attacks within a Stackelberg game, leading to a practical defense.

List-Decodable Mean Estimation in Nearly-PCA Time

- Computer Science, MathematicsArXiv
- 2020

A new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time is proposed.

## References

SHOWING 1-10 OF 57 REFERENCES

Efficient Algorithms and Lower Bounds for Robust Linear Regression

- Computer Science, MathematicsSODA
- 2019

Any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity, must incur an error of $\Omega(\sqrt{\epsilon} \sigma)$.

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

- Computer Science, MathematicsSODA
- 2018

This work gives robust estimators that achieve estimation error $O(\varepsilon)$ in the total variation distance, which is optimal up to a universal constant that is independent of the dimension.

Being Robust (in High Dimensions) Can Be Practical

- Computer Science, MathematicsICML
- 2017

This work addresses sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions.

A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

- Mathematics, Computer ScienceCOLT
- 2018

We consider a model of unreliable or crowdsourced data where there is an underlying set of $n$ binary variables, each evaluator contributes a (possibly unreliable or adversarial) estimate of the…

Robust Estimators in High Dimensions without the Computational Intractability

- Computer Science, Mathematics2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

- Computer Science, MathematicsITCS
- 2018

This work introduces a criterion, resilience, which allows properties of a dataset to be robustly computed, even in the presence of a large fraction of arbitrary additional data, and provides new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments.

Certified Defenses for Data Poisoning Attacks

- Computer Science, MathematicsNIPS
- 2017

This work addresses the worst-case loss of a defense in the face of a determined attacker by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization.

Robust Learning of Fixed-Structure Bayesian Networks

- Computer Science, MathematicsNeurIPS
- 2018

This work provides the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees, which has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples.

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures

- Computer Science, Mathematics2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
- 2017

A general technique that yields the first Statistical Query lower bounds for a range of fundamental high-dimensional learning problems involving Gaussian distributions is described, which implies that the computational complexity of learning GMMs is inherently exponential in the dimension of the latent space even though there is no such information-theoretic barrier.

Better Agnostic Clustering Via Relaxed Tensor Norms

- Computer ScienceArXiv
- 2017

An algorithm is given that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms.