# Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

@inproceedings{Berrett2022OptimalNT, title={Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility}, author={Thomas B. Berrett and Richard J. Samworth}, year={2022} }

Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our ﬁrst contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis. This reveals interesting and novel links to the theory of Fr´echet classes (in particular, compatible distributions) and linear programming, that allow us to propose MCAR tests that are consistent against all detectable…

## References

SHOWING 1-10 OF 68 REFERENCES

### A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data

- MathematicsPsychometrika
- 2015

A nonparametric test of MCAR for incomplete multivariate data which does not require distributional assumptions is proposed and it is proved that the proposed test is consistent against any distributional differences in the observed data.

### A Test of Missing Completely at Random for Multivariate Data with Missing Values

- Mathematics
- 1988

Abstract A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the…

### Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data

- MathematicsPsychometrika
- 2010

A modification of the proposed normal-theory Hawkins test for complete data is proposed to improve its performance, and its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete.

### High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

- Computer ScienceNIPS
- 2011

This work is able to both analyze the statistical error associated with any global optimum, and prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers.

### The geometry of hypothesis testing over convex cones: Generalized likelihood tests and minimax radii

- Mathematics, Computer ScienceThe Annals of Statistics
- 2019

This work provides a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones, and proves information-theoretic lower bounds for minimax testing radius again in Terms of geometric quantities.

### A test of missing completely at random for generalised estimating equations with missing data

- Mathematics
- 1999

We consider inference from generalised estimating equations when data are incomplete. A test for missing completely at random is proposed to help decide whether or not we should adjust estimating…

### Tests of homogeneity of means and covariance matrices for multivariate incomplete data

- Mathematics
- 2002

Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test…

### Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional

- Mathematics, Computer Science
- 2011

The general techniques and results developed in the present paper can also be used to solve other related problems and are shown to be asymptotically sharp minimax when the means are bounded by a given value $M$.

### High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data

- Computer ScienceJournal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2019

A data‐driven and tuning‐free classification rule, which is based on an adaptive constrained l1‐minimization approach, is proposed and analysed and it is shown to be simultaneously rate optimal over a collection of parameter spaces.

### High‐dimensional principal component analysis with heterogeneous missingness

- Computer ScienceJournal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2022

It is proved that the error of primePCA converges to zero at a geometric rate in the noiseless case, and when the signal strength is not too small, which is very encouraging performance across a wide range of scenarios, including settings where the data are not Missing Completely At Random.