Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression

  title={Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression},
  author={Eugene Kennedy},
  journal={Applied Psychological Measurement},
  pages={231 - 237}
  • E. Kennedy
  • Published 1 September 1988
  • Mathematics, Psychology
  • Applied Psychological Measurement
A monte carlo study was conducted to examine the performance of several strategies for estimating the squared cross-validity coefficient of a sample regres sion equation in the context of best subset regression. Data were simulated for populations and experimental designs likely to be encountered in practice. The re sults indicated that a formula presented by Stein (1960) could be expected to yield estimates as good as or better than cross-validation, or several other for mula estimators, for… 

Tables from this paper

Methodology Review: Estimation of Population Validity and Cross-Validity, and the Use of Equal Weights in Prediction
In multiple regression, optimal linear weights are obtained using an ordinary least squares (OLS) procedure. However, these linear weighted combinations of predictors may not optimally predict the
Estimating R 2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
Abstract The effectiveness of various analytical formulas for estimating R 2 shrinkage in multiple regression analysis was investigated. Two categories of formulas were identified: estimators of the
Precision Power Method for Selecting Regression Sample Sizes.
When multiple regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If sample size is inadequate, the model may not predict well in future
Accuracy of Population Validity and Cross-Validity Estimation: An Empirical Comparison of Formula-Based, Traditional Empirical, and Equal Weights Procedures
An empirical monte carlo study was performed using predictor and criterion data from 84,808 U.S. Air Force enlistees. 501 samples were drawn for each of seven sample size conditions: 25, 40, 60, 80,
The effectiveness of various analytical formulas for estimating R Shrinkage in multiple regression analysis was investigated. Two categories of formulas were identified estimators of the squared
The Precision Efficacy Analysis for Regression Sample Size Method.
The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for
The Iterative Homogeneity of Variance Index 1 Running head : ESTIMATING HETEROGENEITY IN META-ANALYSIS The Iterative Homogeneity of Variance Index : Improving Negative Variance Estimates in Meta-Analysis
Determining the variability of observed relationships is a critical step in quantitative research synthesis, requiring the estimation of 2 ˆ ρ σ , variance due to moderator effects. When observed
Pilot-Candidate Selection Method: Sources of Validity
Six hundred seventy-eight Air Force pilot training candidates were tested with a paper-and-pencil aptitude battery and computer-administered tests of psychomotor skills, information processing, and
Use of Multivariate Techniques to Validate and Improve the Current USAF Pilot Candidate Selection Model
Validation of the current PCSM model demonstrated in the first phase of this research is enhanced by the fact that PCSM outperforms all other models developed in the research.
Monte Carlo Simulation for Perusal and Practice.
This paper discusses how one chooses a pseudo-random number generator, and then discusses how to use these generators to simulate data from normal and multivariate normal distributions.


Estimators of the Squared Cross-Validity Coefficient: A Monte Carlo Investigation
A monte carlo experiment was used to evaluate four procedures for estimating the population squared cross-validity of a sample least squares re gression equation. Four levels of population squared
Multiple Regression and Validity Estimation in One Sample
This study empirically investigated equations for estimating the value of the multiple correlation co efficient in the population underlying a sample and the value of the population validity
Maximum R2 Improvement and Stepwise Multiple Regression as Related to Over-Fitting
A Monte Carlo study was conducted comparing the selection of subsets of multiple regression predictors by the methods of forward selection, backward selection, and a newer method that looks for the
Approximating the Distribution of the Sample R2 in Best Subset Regressions
This note presents research on the problem of determining the distribution of the usual sample R 2 statistic in multiple regression studies where the variables to be included in the regression
The squared correlation coefficient, w2, between an empirically chosen linear function of predictors, B0 + B′x, and a criterion, y, is employed as a measure of predictive precision. This coefficient
Inflation of R2 in Best Subset Regression
When subset selection is used in regression the expected value of R 2 is substantially inflated above its value without selection, especially when the number of observations is less than the number
If a multiple regression equation is computed from one sample and applied to subsequent samples, the errors of prediction in the later samples will be larger than those in the first sample or those
The Parameters of Cross-Validation
Abstract : The validation of predictor weights, derived in one sample, by computing the correlation of the weighted sum of the predictors with the criterion in new samples is called cross-validation.
The Use of an F-Statistic in Stepwise Regression Procedures
This is an expository paper, pointing out explicitly the pseudoness of the “F-statistic” used in stepwise procedures for determining the independent variables to be used in a linear prediction
Applied multiple regression/correlation analysis for the behavioral sciences
Contents: Preface. Introduction. Bivariate Correlation and Regression. Multiple Regression/Correlation With Two or More Independent Variables. Data Visualization, Exploration, and Assumption