Model Selection Techniques: An Overview

  title={Model Selection Techniques: An Overview},
  author={Jie Ding and Vahid Tarokh and Yuhong Yang},
  journal={IEEE Signal Processing Magazine},
In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such… 

Figures and Tables from this paper

On Statistical Efficiency in Learning

A generalized notion of Takeuchi’s information criterion is proposed and it is proved that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions.

Hierarchical Bayesian data selection

The concept of Bayesian data selection is introduced; the simultaneous inference of both model parameters, and parameters which represent the belief that each observation within the data should be included in the inference.

Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error

This paper discusses criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities.

Robust Information Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models

A new form of the EBIC criterion called EBIC-Robust is proposed, which is invariant to data scaling and consistent in both large sample size and high-SNR scenarios.

Efficient and Consistent Data-Driven Model Selection for Time Series

This paper proves that consistent model selection criteria outperform classical AIC criterion in terms of efficiency and derives from a Bayesian approach the usual BIC criterion, by keeping all the second order terms of the Laplace approximation, a data-driven criterion denoted KC’.

Selection of Heteroscedastic Models: A Time Series Forecasting Approach

To overcome the weaknesses of in-sample model selection, this study adopted out-of-sample model selection approach for selecting models with improved forecasting accuracies and performances. Daily

Model Linkage Selection for Cooperative Learning

This paper proposes a novel framework for integrating information across a set of learners that is robust against model misspecification and misspecified parameter sharing patterns, and shows that the proposed method can data-adaptively select the correct parameter share patterns based on a user-specified parameter sharing pattern, and thus enhances the prediction accuracy of a learner.

A time-reversed model selection approach to time series forecasting

A novel model selection approach to time series forecasting is introduced by combining theoretical principles of time-reversibility in time series with conventional modeling approaches such as information criteria to construct a criterion that employs the backwards prediction (backcast) as a proxy for the forecast.

Non-Asymptotic Guarantees for Robust Identification of Granger Causality via the LASSO

It is established that the sufficient conditions of LASSO also suffice for robust identification of Granger causal influences, and the false positive error probability of a simple thresholding rule for identifying Granger causal effects is characterized.

Model family selection for classification using Neural Decision Trees

This paper proposes a method to reduce the scope of exploration needed for model selection by progressively relaxing the decision boundaries of the initial decision trees (the RMs) as long as this is beneficial in terms of performance measured on an analyzed dataset.



Model selection and multimodel inference : a practical information-theoretic approach

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is

Variable Selection Diagnostics Measures for High-Dimensional Regression

This work proposes variable selection deviation measures that give one a proper sense on how many predictors in the selected set are likely trustworthy in certain aspects and demonstrates the utility of these measures for application.

On Model Selection Consistency of Lasso

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.

Model selection and estimation in regression with grouped variables

Summary.  We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor

Toward an objective and reproducible model choice via variable selection deviation

For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies based on variable selection deviation.

Model Selection and the Principle of Minimum Description Length

This article reviews the principle of minimum description length (MDL) for problems of model selection, and illustrates the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis.

Parametric or nonparametric? A parametricness index for model selection

A measure, parametricness index (PI), is developed to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function.

Model selection by MCMC computation

Extended Bayesian information criteria for model selection with large model spaces

This paper re-examine the Bayesian paradigm for model selection and proposes an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space.

Modeling and variable selection in epidemiologic analysis.

  • S. Greenland
  • Computer Science
    American journal of public health
  • 1989
An overview of problems in multivariate modeling of epidemiologic data is provided, and some proposed solutions are examined, including model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests.