Model Selection Techniques: An Overview

  title={Model Selection Techniques: An Overview},
  author={Jie Ding and Vahid Tarokh and Yuhong Yang},
  journal={IEEE Signal Processing Magazine},
  • Jie Ding, V. Tarokh, Yuhong Yang
  • Published 2018
  • Political Science, Computer Science, Mathematics, Economics, Physics
  • IEEE Signal Processing Magazine
In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such… Expand
On Statistical Efficiency in Learning
A generalized notion of Takeuchi’s information criterion is proposed and it is proved that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. Expand
Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error
This paper discusses criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. Expand
Selection of Heteroscedastic Models: A Time Series Forecasting Approach
To overcome the weaknesses of in-sample model selection, this study adopted out-of-sample model selection approach for selecting models with improved forecasting accuracies and performances. DailyExpand
Targeted Cross-Validation
This work proposes a targeted cross-validation (TCV) to select models or procedures based on a general weighted L2 loss and shows that the TCV is consistent in selecting the best performing candidate under the weighted L1 loss. Expand
Variable Grouping Based Bayesian Additive Regression Tree
A two-stage method named variable grouping based Bayesian additive regression tree (GBART) with a well-developed python package gbart available is proposed to enhance the predictive performance of ensemble methods for regression. Expand
Model family selection for classification using Neural Decision Trees
This paper proposes a method to reduce the scope of exploration needed for model selection by progressively relaxing the decision boundaries of the initial decision trees (the RMs) as long as this is beneficial in terms of performance measured on an analyzed dataset. Expand
On improvability of model selection by model averaging
Abstract In regression, model averaging (MA) provides an alternative to model selection (MS), and asymptotic efficiency theories have been derived for both MS and MA. Basically, under sensibleExpand
Controlling the error probabilities of model selection information criteria using bootstrapping
The Error Control for Information Criteria (ECIC) method is presented, a bootstrap approach to controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions. Expand
Consistent model selection criteria and goodness-of-fit test for affine causal processes
This paper studies the model selection problem in a large class of causal time series models, which includes both the ARMA or AR(∞) processes, as well as the GARCH or ARCH(∞), APARCH, ARMA-GARCH andExpand
Statistical Methods for the Automatization of Basic Loss Model Calibration
A method to detect trends and change points in the loss triangles of basic loss portfolios in order to ensure an appropriate assessment of the claims reserve and the premium risk and reserve risk based on these data. Expand


Variable Selection Diagnostics Measures for High-Dimensional Regression
Many exciting results have been obtained on model selection for high-dimensional data in both efficient algorithms and theoretical developments. The powerful penalized regression methods can giveExpand
Model selection and multimodel inference : a practical information-theoretic approach
The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy isExpand
Model selection and estimation in regression with grouped variables
Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactorExpand
Toward an objective and reproducible model choice via variable selection deviation.
For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies based on variable selection deviation. Expand
On Model Selection Consistency of Lasso
  • P. Zhao, Bin Yu
  • Mathematics, Computer Science
  • J. Mach. Learn. Res.
  • 2006
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Expand
Model Selection and the Principle of Minimum Description Length
This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDLExpand
Parametric or nonparametric? A parametricness index for model selection
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection whenExpand
Model selection by MCMC computation
This paper addresses the MCMC methods from the second group, which allow for generation of samples from probability distributions defined on unions of disjoint spaces of different dimensions and shows why sampling from such distributions is a nontrivial task. Expand
Extended Bayesian information criteria for model selection with large model spaces
The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose anExpand
Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and information criteria canExpand