# Robust Mean Estimation under Coordinate-level Corruption

@inproceedings{Liu2021RobustME, title={Robust Mean Estimation under Coordinate-level Corruption}, author={Zifan Liu and Jongho Park and Nils Palumbo and Theodoros Rekatsinas and Christos Tzamos}, booktitle={ICML}, year={2021} }

We study the problem of robust mean estimation and introduce a novel Hamming distance-based measure of distribution shift for coordinate-level corruptions. We show that this measure yields adversary models that capture more realistic corruptions than those used in prior works, and present an information-theoretic analysis of robust mean estimation in these settings. We show that for structured distributions, methods that leverage the structure yield information theoretically more accurate mean…

## 4 Citations

### Picket: guarding against corrupted data in tabular data during learning and inference

- Computer ScienceVLDB J.
- 2022

Picket consistently safeguards against corrupted data during both training and deployment of various models ranging from SVMs to neural networks, beating a diverse array of competing methods that span from data quality validation models to robust outlier-detection models.

### Picket: Self-supervised Data Diagnostics for ML Pipelines

- Computer ScienceArXiv
- 2020

Picket, a first-of-its-kind system that enables data diagnostics for machine learning pipelines over tabular data, is presented and shows that Picket offers consistently accurate diagnostics during both training and deployment of various models ranging from SVMs to neural networks, beating competing methods of data quality validation in machineLearning pipelines.

### Machine Learning and Data Cleaning: Which Serves the Other?

- Computer ScienceACM J. Data Inf. Qual.
- 2022

This symbiotic relationship between ML and data cleaning is highlighted and few challenges that require collaborative efforts of multiple research communities are discussed.

### Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

- Computer ScienceArXiv
- 2021

This survey studies the research landscape for data collection and data quality primarily for deep learning applications, and studies fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training.

## References

SHOWING 1-10 OF 59 REFERENCES

### Generalized Resilience and Robust Statistics

- MathematicsThe Annals of Statistics
- 2022

This work generalizes the robust statistics approach to consider perturbations under any Wasserstein distance, and shows that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbation.

### High-dimensional robust precision matrix estimation: Cellwise corruption under $\epsilon$-contamination

- Computer Science
- 2015

We analyze the statistical consistency of robust estimators for precision matrices in high dimensions. We focus on a contamination mechanism acting cellwise on the data matrix. The estimators we…

### Efficient Algorithms for Outlier-Robust Regression

- Computer Science, MathematicsCOLT
- 2018

This work gives the first polynomial-time algorithm for performing linear orPolynomial regression resilient to adversarial corruptions in both examples and labels and gives a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting.

### Being Robust (in High Dimensions) Can Be Practical

- Computer ScienceICML
- 2017

This work addresses sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions.

### Robust Estimators in High Dimensions without the Computational Intractability

- Computer Science2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

### Robust estimation via robust gradient estimation

- Computer Science, MathematicsJournal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2020

The workhorse is a novel robust variant of gradient descent, and the conditions under which this gradient descent variant provides accurate estimators in a general convex risk minimization problem are provided.

### Recent Advances in Algorithmic High-Dimensional Robust Statistics

- Computer ScienceArXiv
- 2019

The core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation are introduced and an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks are provided.

### Efficient Statistics, in High Dimensions, from Truncated Samples

- Mathematics2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)
- 2018

It is shown that the mean mu and covariance matrix Sigma can be estimated with arbitrary accuracy in polynomial-time, as long as oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution.

### Agnostic Estimation of Mean and Covariance

- Computer Science, Mathematics2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

This work presents polynomial-time algorithms to estimate the mean and covariance of a distribution from i.i.d. samples in the presence of a fraction of malicious noise with error guarantees in terms of information-theoretic lower bounds.

### Efficient Truncated Statistics with Unknown Truncation

- Computer Science, Mathematics2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
- 2019

The main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area.