• Corpus ID: 248863249

Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

@inproceedings{Berrett2022OptimalNT,
  title={Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility},
  author={Thomas B. Berrett and Richard J. Samworth},
  year={2022}
}
Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis. This reveals interesting and novel links to the theory of Fr´echet classes (in particular, compatible distributions) and linear programming, that allow us to propose MCAR tests that are consistent against all detectable… 

Figures from this paper

References

SHOWING 1-10 OF 68 REFERENCES

A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data

A nonparametric test of MCAR for incomplete multivariate data which does not require distributional assumptions is proposed and it is proved that the proposed test is consistent against any distributional differences in the observed data.

A Test of Missing Completely at Random for Multivariate Data with Missing Values

Abstract A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the

Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data

A modification of the proposed normal-theory Hawkins test for complete data is proposed to improve its performance, and its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete.

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

This work is able to both analyze the statistical error associated with any global optimum, and prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers.

The geometry of hypothesis testing over convex cones: Generalized likelihood tests and minimax radii

This work provides a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones, and proves information-theoretic lower bounds for minimax testing radius again in Terms of geometric quantities.

A test of missing completely at random for generalised estimating equations with missing data

We consider inference from generalised estimating equations when data are incomplete. A test for missing completely at random is proposed to help decide whether or not we should adjust estimating

Tests of homogeneity of means and covariance matrices for multivariate incomplete data

Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test

Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional

The general techniques and results developed in the present paper can also be used to solve other related problems and are shown to be asymptotically sharp minimax when the means are bounded by a given value $M$.

High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data

  • T. Tony CaiLinjun Zhang
  • Computer Science
    Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2019
A data‐driven and tuning‐free classification rule, which is based on an adaptive constrained l1‐minimization approach, is proposed and analysed and it is shown to be simultaneously rate optimal over a collection of parameter spaces.

High‐dimensional principal component analysis with heterogeneous missingness

It is proved that the error of primePCA converges to zero at a geometric rate in the noiseless case, and when the signal strength is not too small, which is very encouraging performance across a wide range of scenarios, including settings where the data are not Missing Completely At Random.
...