Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling

  title={Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling},
  author={Rina Foygel Barber and Lucas Janson},
  journal={The Annals of Statistics},
Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable… 

Figures from this paper

One Step to Efficient Synthetic Data

The approach allows for the construction of both partially synthetic datasets, which preserve the summary statistics without formal privacy methods, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with asymptotically efficient summary statistics.

Randomization Tests for Adaptively Collected Data

This paper presents a general framework for randomization testing on adaptively collected data, encompassing (and in some cases improving) the few existing results on randomizationTesting and conformal inference for adaptively collection data, as well as many other important settings.

Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values

This work introduces rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or pvalue under mild assumptions and applies it to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals.

Correcting Confounding via Random Selection of Background Variables

A method to distinguish causal influence from hidden confounding in the following scenario: given a target variable Y, potential causal drivers X , and a large number of background features, and a novel criterion based on the stability of regression coefficients of X on Y with respect to selecting different background features is proposed.

On the power of conditional independence testing under model-X

For testing conditional independence (CI) of a response Y and a predictor X given covariates Z , the recently introduced model-X (MX) framework has been the subject of active methodological research,

Conditional Monte Carlo revisited

Conditional Monte Carlo refers to sampling from the conditional distribution of a random vector X given the value T(X)=t for a function T(X) . Classical conditional Monte Carlo methods were designed



On the Conditional Distribution of Goodness-of-Fit Tests

ABSTRACT This manuscript advocates the use of the conditional distribution of the goodness-of-fit test, given the value of the minimal sufficient statistic for the parameters, in the problem of

On the use of priors in goodness‐of‐fit tests

Priors are introduced into goodness‐of‐fit tests, both for unknown parameters in the tested distribution and on the alternative density. Neyman–Pearson theory leads to the test with the highest

Monte Carlo exact goodness-of-fit tests for nonhomogeneous Poisson processes

Nonhomogeneous Poisson processes (NHPPs) are often used to model failure data from repairable systems, and there is thus a need to check model fit for such models. We study the problem of obtaining

On goodness of fit tests for the Poisson, negative binomial and binomial distributions

In this paper, we address the problem of testing the fit of three discrete distributions, giving a brief account of existing tests and proposing two new tests. One of the new tests is for any

Conditional limit laws for goodness-of-fit tests

We study the conditional distribution of goodness of fit statistics of the Cram\'{e}r--von Mises type given the complete sufficient statistics in testing for exponential family models. We show that

Exact Conditional Tests and Approximate Bootstrap Tests for the von Mises Distribution

Exact and approximate tests of fit are compared for testing that a given sample comes from the von Mises distribution. For the exact test, Gibbs sampling is used to generate samples from the

The conditional permutation test for independence while controlling for confounders

A general new method for testing the conditional independence of variables X and Y given a potentially high dimensional random vector Z that may contain confounding factors, and establishes bounds on the type I error in terms of the error in the approximation of the conditional distribution of X|Z.

Likelihood-Free Inference in High-Dimensional Models

A novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter is introduced, rendering this approach suitable even for models of very high dimensionality.

Goodness-of-Fit and Sufficiency: Exact and Approximate Tests

A procedure to test fit to a distribution where a minimal sufficient statistic is available, is discussed for testing the Poisson distribution. The test is exact, and is compared with a simpler

Tests based on monte carlo simulations conditioned on maximum likelihood estimates of nuisance parameters

A method to eliminate nuisance parameters in statistical inference is to condition on sufficient statistics for those parameters. In many situations it is not possible to find appropriate sufficient