# Model Selection Techniques: An Overview

@article{Ding2018ModelST, title={Model Selection Techniques: An Overview}, author={Jie Ding and Vahid Tarokh and Yuhong Yang}, journal={IEEE Signal Processing Magazine}, year={2018}, volume={35}, pages={16-34} }

In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such…

## 126 Citations

### On Statistical Efficiency in Learning

- Computer ScienceIEEE Transactions on Information Theory
- 2021

A generalized notion of Takeuchi’s information criterion is proposed and it is proved that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions.

### Hierarchical Bayesian data selection

- Computer Science
- 2022

The concept of Bayesian data selection is introduced; the simultaneous inference of both model parameters, and parameters which represent the belief that each observation within the data should be included in the inference.

### Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error

- Computer ScienceMach. Learn. Knowl. Extr.
- 2019

This paper discusses criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities.

### Robust Information Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models

- Computer Science
- 2022

A new form of the EBIC criterion called EBIC-Robust is proposed, which is invariant to data scaling and consistent in both large sample size and high-SNR scenarios.

### Efficient and Consistent Data-Driven Model Selection for Time Series

- Mathematics, Computer Science
- 2021

This paper proves that consistent model selection criteria outperform classical AIC criterion in terms of efficiency and derives from a Bayesian approach the usual BIC criterion, by keeping all the second order terms of the Laplace approximation, a data-driven criterion denoted KC’.

### Selection of Heteroscedastic Models: A Time Series Forecasting Approach

- BusinessApplied Mathematics
- 2019

To overcome the weaknesses of in-sample model selection, this study adopted out-of-sample model selection approach for selecting models with improved forecasting accuracies and performances. Daily…

### Model Linkage Selection for Cooperative Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This paper proposes a novel framework for integrating information across a set of learners that is robust against model misspecification and misspecified parameter sharing patterns, and shows that the proposed method can data-adaptively select the correct parameter share patterns based on a user-specified parameter sharing pattern, and thus enhances the prediction accuracy of a learner.

### A time-reversed model selection approach to time series forecasting

- Computer ScienceScientific reports
- 2022

A novel model selection approach to time series forecasting is introduced by combining theoretical principles of time-reversibility in time series with conventional modeling approaches such as information criteria to construct a criterion that employs the backwards prediction (backcast) as a proxy for the forecast.

### Non-Asymptotic Guarantees for Robust Identification of Granger Causality via the LASSO

- Computer ScienceArXiv
- 2021

It is established that the sufficient conditions of LASSO also suffice for robust identification of Granger causal influences, and the false positive error probability of a simple thresholding rule for identifying Granger causal effects is characterized.

### Model family selection for classification using Neural Decision Trees

- Computer ScienceArXiv
- 2020

This paper proposes a method to reduce the scope of exploration needed for model selection by progressively relaxing the decision boundaries of the initial decision trees (the RMs) as long as this is beneficial in terms of performance measured on an analyzed dataset.

## References

SHOWING 1-10 OF 94 REFERENCES

### Model selection and multimodel inference : a practical information-theoretic approach

- Computer Science
- 2003

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is…

### Variable Selection Diagnostics Measures for High-Dimensional Regression

- Computer Science
- 2014

This work proposes variable selection deviation measures that give one a proper sense on how many predictors in the selected set are likely trustworthy in certain aspects and demonstrates the utility of these measures for application.

### On Model Selection Consistency of Lasso

- Computer ScienceJ. Mach. Learn. Res.
- 2006

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.

### Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor…

### Toward an objective and reproducible model choice via variable selection deviation

- BiologyBiometrics
- 2017

For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies based on variable selection deviation.

### Model Selection and the Principle of Minimum Description Length

- Computer Science
- 2001

This article reviews the principle of minimum description length (MDL) for problems of model selection, and illustrates the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis.

### Parametric or nonparametric? A parametricness index for model selection

- Mathematics, Computer Science
- 2011

A measure, parametricness index (PI), is developed to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function.

### Extended Bayesian information criteria for model selection with large model spaces

- Computer Science
- 2008

This paper re-examine the Bayesian paradigm for model selection and proposes an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space.

### Modeling and variable selection in epidemiologic analysis.

- Computer ScienceAmerican journal of public health
- 1989

An overview of problems in multivariate modeling of epidemiologic data is provided, and some proposed solutions are examined, including model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests.