• Corpus ID: 18590004

Selective Sequential Model Selection

@article{Fithian2015SelectiveSM,
  title={Selective Sequential Model Selection},
  author={William Fithian and Jonathan E. Taylor and Robert Tibshirani and Ryan J. Tibshirani},
  journal={arXiv: Methodology},
  year={2015}
}
Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two… 

Figures and Tables from this paper

More Powerful Selective Kernel Tests for Feature Selection
TLDR
This work extends two recent proposals for selecting features using the Maximum Mean Discrepancy and Hilbert Schmidt Independence Criterion to condition on the minimal conditioning event and shows how recent advances in multiscale bootstrap makes conditioning on the minimum selection event possible.
Exact post-selection inference for the generalized lasso path
TLDR
Practical aspects of the methods such as (valid, i.e., fully-accounted for) post-processing of generalized lasso estimates before performing inference in order to improve power, and problem-specific visualization aids that may be given to the data analyst for he/she to choose linear contrasts to be tested are described.
Testing-Based Forward Model Selection
This work introduces a theoretical foundation for a procedure called `testing-based forward model selection' in regression problems. Forward selection is a general term refering to a model selection
Exact Post-Selection Inference for Sequential Regression Procedures
ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general
A One-Covariate at a Time, Multiple Testing Approach to Variable Selection in High-Dimensional Linear Regression Models
TLDR
The OCMT provides an alternative to penalised regression methods that is based on statistical inference and is therefore easier to interpret and relate to the classical statistical analysis, it allows working under more general assumptions, it is faster, and performs well in small samples for almost all of the different sets of experiments considered in this paper.
Exact Post-Selection Inference for Changepoint Detection and Other Generalized Lasso Problems
TLDR
Practical aspects of the methods such as valid post-processing of generalized estimates before performing inference in order to improve power, and problem-specific visualization aids that may be given to the data analyst for he/she to choose linear contrasts to be tested are described.
Efficient test-based variable selection for high-dimensional linear models
More Powerful Conditional Selective Inference for Generalized Lasso by Parametric Programming
TLDR
This study proposes a more powerful and general conditional SI method for a class of problems that can be converted into quadratic parametric programming, which includes generalized lasso and improves the performance and practicality of SI in various respects.
Analysis of Testing‐Based Forward Model Selection
  • D. Kozbur
  • Mathematics, Computer Science
    Econometrica
  • 2020
TLDR
This paper proves probabilistic bounds, which depend on the quality of the tests, for prediction error and the number of selected covariates in linear regression problems, to be specialized to a case with heteroscedastic data.
More powerful post-selection inference, with application to the Lasso
TLDR
This work shows how to generate hypotheses in a strategic manner that sharply reduces the cost of data exploration and results in useful confidence intervals.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
A significance test for forward stepwise model selection
We apply the methods developed by Lockhart et al. (2013) and Taylor et al. (2013) on significance tests for penalized regression to forward stepwise model selection. A general framework for selection
Testing-Based Forward Model Selection
This work introduces a theoretical foundation for a procedure called `testing-based forward model selection' in regression problems. Forward selection is a general term refering to a model selection
Exact Post-Selection Inference for Sequential Regression Procedures
ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general
Sequential selection procedures and false discovery rate control
TLDR
This work proposes two new testing procedures and proves that they control the false discovery rate in the ordered testing setting and shows how the methods can be applied to model selection by using recent results on p‐values in sequential model selection settings.
A simple forward selection procedure based on false discovery rate control
TLDR
It is shown that FDR based procedures have good performance, and in particular the newly proposed method, emerges as having empirical minimax performance, Interestingly, using FDR level of 0.05 is a global best.
Uniform asymptotic inference and the bootstrap after model selection
TLDR
The large sample properties of this method, without assuming normality, are studied, and it is proved that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples n grows and the dimension d of the regression problem stays fixed.
Accumulation Tests for FDR Control in Ordered Hypothesis Testing
TLDR
This article develops a family of “accumulation tests” to choose a cutoff k that adapts to the amount of signal at the top of the ranked list, and introduces a new method in this family, the HingeExp method, which offers higher power to detect true signals compared to existing techniques.
Stability selection
TLDR
It is proved for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated.
Optimal Inference After Model Selection
To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency
A new look at the statistical model identification
The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as
...
...