Model Selection confidence sets by likelihood ratio testing

  title={Model Selection confidence sets by likelihood ratio testing},
  author={Chao Zheng and Davide Ferrari and Yuhong Yang},
  journal={Statistica Sinica},
The traditional activity of model selection aims at discovering a single model superior to other candidate models. In the presence of pronounced noise, however, multiple models are often found to explain the same data equally well. To resolve this model selection ambiguity, we introduce the general approach of model selection confidence sets (MSCSs) based on likelihood ratio testing. A MSCS is defined as a list of models statistically indistinguishable from the true model at a user-specified… 

Figures and Tables from this paper

Enhancing Multi-model Inference with Natural Selection

The convergence properties of genetic algorithm (GA) are studied based on the Markov chain theory and used to design an adaptive termination criterion that vastly reduces the computational cost.

Order selection with confidence for finite mixture models

The determination of the number of mixture components (the order) of a finite mixture model has been an enduring problem in statistical inference. We prove that the closed testing principle leads to

Confidence graphs for graphical model selection

This article first identifies two nested graphical models—called small and large confidence graphs (SCG and LCG)—trapping the true graphical model in between at a given level of confidence, just like the endpoints of traditional confidence interval capturing the population parameter.

Simple measures of uncertainty for model selection

Two simple measures of uncertainty for a model selection procedure are developed, similar in spirit to confidence set in parameter estimation; the second measure is focusing on error in model selection.

Assessing the Global and Local Uncertainty of Scientific Evidence in the Presence of Model Misspecification

Non-parametric bootstrap methodologies for estimating the sampling distribution of the evidence estimator under model misspecification are developed, which allows us to determine how secure the authors are in their evidential statement.

Subdata selection algorithm for linear model discrimination

This work proposes a subdata selection method based on leverage scores which enables us to conduct the selection task on a small subdata set and improves the probability of selecting the best model but also enhances the estimation efficiency.

Ranking the importance of genetic factors by variable‐selection confidence sets

This work addresses the ambiguity related to SNP selection by constructing a list of models—called a variable‐selection confidence set (VSCS)—which contains the collection of all well‐supported SNP combinations at a user‐specified confidence level.

Discussion on Prior-based Bayesian Information Criterion (PBIC) by M. J. Bayarri, James O. Berger, Woncheol Jang, Surajit Ray, Luis R. Pericchi, and Ingmar Visser

This elucidating paper unpacked a dangerous complication when one takes the classic BIC verbatim as an approximation to themarginal likelihood, and proposed the Prior-based Bayesian Information Criterion (PBIC) as a principled correction.

Variable Importance Based Interaction Modeling with an Application on Initial Spread of COVID-19 in China (preprint)

This paper introduces a variable importance based interaction modeling (VIBIM) procedure for learning interactions in a linear regression model with both continuous and categorical predictors and shows that the VIBIM approach leads to better models in terms of interpretability, stability, reliability and prediction.

Visualization and assessment of model selection uncertainty



The Model Confidence Set

The paper revisits the inflation forecasting problem posed by Stock and Watson (1999), and compute the model confidence set (MCS) for their set of inflation forecasts, and compares a number of Taylor rule regressions to determine the MCS of the best in terms of in-sample likelihood criteria.

Confidence sets for model selection by F -testing

We introduce the notion of variable selection con dence set (VSCS) for linear regression based on F -testing. Our method identi es the most important variables in a principled way that goes beyond

An Application of Multiple Comparison Techniques to Model Selection

Considering the sampling error of AIC, a set of good models is constructed rather than choosing a single model, called a confidence set of models, which includes the minimum ε{AIC} model at an error rate smaller than the specified significance level.

Model Selection and Model Averaging

Guarding from Spurious Discoveries in High Dimension

A measure of goodness of spurious fit is defined, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and a simple and effective LAMM algorithm is proposed to compute it.

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

Parametric estimation. Finite sample theory

The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are: (1) the study is non-asymptotic, that is, the

Nonconcave penalized likelihood with a diverging number of parameters

A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed by Fan and Li to simultaneously estimate parameters and select important variables.

Robust Bounded-Influence Tests in General Parametric Models

Abstract We introduce robust tests for testing hypotheses in a general parametric model. These are robust versions of the Wald, scores, and likelihood ratio tests and are based on general M

Regression Shrinkage and Selection via the Lasso

A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.