• Corpus ID: 231786296

Splitting strategies for post-selection inference

  title={Splitting strategies for post-selection inference},
  author={Daniel Garcia Rasines and G. Alastair Young},
We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection step. Many approaches have been proposed in recent years to ensure inferential validity. Here, we consider a simple alternative to data splitting based on randomising the response vector, which allows for higher selection and inferential power than the former and… 

Figures and Tables from this paper

Approximate Post-Selective Inference for Regression with the Group LASSO

A consistent, post-selective, Bayesian method to address the existing gaps by deriving a likelihood adjustment factor and an approximation thereof that eliminates bias from the selection of groups is developed.

Selective Inference in Propensity Score Analysis

This paper develops selective inference in propensity score analysis with a semiparametric approach, which has become a standard tool in causal inference.

Post-Selection Inference via Algorithmic Stability

This work revisit the PoSI problem through the lens of algorithmic stability, and shows that stability parameters of a selection method alone suffice to provide non-trivial corrections to classical z-test and t-test intervals.

Conditional Versus Unconditional Approaches to Selective Inference

It is shown that selective inference methods based on selection and conditioning are always dominated by multiple testing methods defined directly on the full universe of hypotheses, even when this universe is potentially inflnite and only deﷁned implicitly, such as in data splitting.

Inference in High-dimensional Linear Regression

This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression

Selective inference for k-means clustering

A finite-sample p-value is proposed that controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering, and it is shown that it can be efficiently computed.

Some Perspectives on Inference in High Dimensions

The main emphasis of the present paper lies on contexts where formulation in terms of a probabilistic model is feasible and fruitful but to be at all realistic large numbers of unknown parameters need consideration.

Empirical Bayes and Selective Inference

We review the empirical Bayes approach to large-scale inference. In the context of the problem of inference for a high-dimensional normal mean, empirical Bayes methods are advocated as they exhibit

More powerful selective inference for the graph fused lasso

This work proposes a new test for this task that controls the selective Type I error, and conditions on less information than existing approaches, leading to substantially higher power.

Data blurring: sample splitting a single sample A Applications to selective confidence intervals in generalized linear models

A more general methodology for achieving a split in samples of a random vector X by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting.



Exact Post Model Selection Inference for Marginal Screening

A framework for post model selection inference, via marginal screening, in linear regression is developed that characterizes the exact distribution of linear functions of the response $y$, conditional on the model being selected ( ``condition on selection" framework).

Bootstrapping and sample splitting for high-dimensional, assumption-lean inference

This paper revisit sample splitting combined with the bootstrap, and shows that this leads to a simple, assumption-free approach to inference, and finds new bounds on the accuracy of the boot strap and the Normal approximation for general nonlinear parameters with increasing dimension.

Integrative methods for post-selection inference under convex constraints

Methods for carrying out inference conditional on selection are developed, which are more flexible in the sense that they naturally accommodate different models for the data, instead of requiring a case-by-case treatment.

Inference after black box selection.

The problem of inference for parameters selected to report only after some algorithm is considered, the canonical example being inference for model parameters after a model selection procedure, and is recast into a statistical learning problem which can be fit with off-the-shelf models for binary regression.

Uniformly valid confidence intervals post-model-selection

This work suggests general methods to construct asymptotically uniformly valid confidence intervals post-model-selection based on principles recently proposed by Berk et al. (2013), which perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures.

Exact post-selection inference, with application to the lasso

A general approach to valid inference after model selection by the lasso is developed to form valid confidence intervals for the selected coefficients and test whether all relevant variables have been included in the model.

Optimal Inference After Model Selection

To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency

Valid post-selection inference

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees


This paper looks at the error rates and power of some multi-stage regression methods and considers three screening methods: the lasso, marginal regression, and forward stepwise regression.

Valid confidence intervals for post-model-selection predictors

The PoSI intervals are generalized to post-model-selection predictors in linear regression, and their applications in inference and model selection are considered.