• Corpus ID: 211075853

Robust Mean Estimation under Coordinate-level Corruption

  title={Robust Mean Estimation under Coordinate-level Corruption},
  author={Zifan Liu and Jongho Park and Nils Palumbo and Theodoros Rekatsinas and Christos Tzamos},
We study the problem of robust mean estimation and introduce a novel Hamming distance-based measure of distribution shift for coordinate-level corruptions. We show that this measure yields adversary models that capture more realistic corruptions than those used in prior works, and present an information-theoretic analysis of robust mean estimation in these settings. We show that for structured distributions, methods that leverage the structure yield information theoretically more accurate mean… 

Figures and Tables from this paper

Picket: guarding against corrupted data in tabular data during learning and inference

Picket consistently safeguards against corrupted data during both training and deployment of various models ranging from SVMs to neural networks, beating a diverse array of competing methods that span from data quality validation models to robust outlier-detection models.

Picket: Self-supervised Data Diagnostics for ML Pipelines

Picket, a first-of-its-kind system that enables data diagnostics for machine learning pipelines over tabular data, is presented and shows that Picket offers consistently accurate diagnostics during both training and deployment of various models ranging from SVMs to neural networks, beating competing methods of data quality validation in machineLearning pipelines.

Machine Learning and Data Cleaning: Which Serves the Other?

This symbiotic relationship between ML and data cleaning is highlighted and few challenges that require collaborative efforts of multiple research communities are discussed.

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

This survey studies the research landscape for data collection and data quality primarily for deep learning applications, and studies fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training.



Generalized Resilience and Robust Statistics

This work generalizes the robust statistics approach to consider perturbations under any Wasserstein distance, and shows that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbation.

High-dimensional robust precision matrix estimation: Cellwise corruption under $\epsilon$-contamination

We analyze the statistical consistency of robust estimators for precision matrices in high dimensions. We focus on a contamination mechanism acting cellwise on the data matrix. The estimators we

Efficient Algorithms for Outlier-Robust Regression

This work gives the first polynomial-time algorithm for performing linear orPolynomial regression resilient to adversarial corruptions in both examples and labels and gives a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting.

Being Robust (in High Dimensions) Can Be Practical

This work addresses sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions.

Robust Estimators in High Dimensions without the Computational Intractability

This work obtains the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: a single Gaussian, a product distribution on the hypercube, mixtures of two product distributions (under a natural balancedness condition), and k Gaussians with identical spherical covariances.

Robust estimation via robust gradient estimation

The workhorse is a novel robust variant of gradient descent, and the conditions under which this gradient descent variant provides accurate estimators in a general convex risk minimization problem are provided.

Recent Advances in Algorithmic High-Dimensional Robust Statistics

The core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation are introduced and an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks are provided.

Efficient Statistics, in High Dimensions, from Truncated Samples

It is shown that the mean mu and covariance matrix Sigma can be estimated with arbitrary accuracy in polynomial-time, as long as oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution.

Agnostic Estimation of Mean and Covariance

This work presents polynomial-time algorithms to estimate the mean and covariance of a distribution from i.i.d. samples in the presence of a fraction of malicious noise with error guarantees in terms of information-theoretic lower bounds.

Efficient Truncated Statistics with Unknown Truncation

The main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area.