Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

  title={Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.},
  author={Frank E. Harrell and K L Lee and Daniel B. Mark},
  journal={Statistics in medicine},
  volume={15 4},
Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly… 

Validation measures for prognostic models for independent and correlated binary and survival outcomes

Existing validation measures for independent data, such as the C-index, D statistic, calibration slope, Brier score, and the K statistic for use with random effects/frailty models are extended.

An evaluation of penalised survival methods for developing prognostic models with rare events

Three existing penalised methods that have been proposed to improve predictive accuracy, including ridge, lasso and the garotte, are evaluated using simulated data derived from two clinical datasets and suggest that significant improvements are possible by taking a penalised modelling approach.

Prognostic Modeling with Logistic Regression Analysis

A sensible strategy in small data sets is to apply shrinkage methods in full models that include well-coded predictors that are selected based on external information, such as full models including all available covariables.

Measures of discrimination and predictive accuracy for interval censored survival data

Medical researchers frequently make statements that one model predicts survival better than another, and are frequently challenged to provide rigorous statistical justification for these statements.

Several methods to assess improvement in risk prediction models: Extension to survival analysis

The primary parameters considered are net reclassification improvement (NRI) and integrated discrimination improvement (IDI) and a primary measure of concordance, area under the ROC curve (AUC), also called the c-statistic.

Risk assessment with newer statistical metrics

The first Pencina article in this issue presents to assess the impact of calibration on the newer metrics of model performance, including Area under the Curve (AUC), discrimination slope, R-model, and R-residuals.

Assessing calibration of prognostic risk scores

A model-based framework for the assessment of calibration in the binary setting that provides natural extensions to the survival data setting and it is shown that Poisson regression models can be used to easily assess calibration in prognostic models.

Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.

It is found that stepwise selection with a low alpha led to a relatively poor model performance, when evaluated on independent data, and shrinkage methods in full models including prespecified predictors and incorporation of external information are recommended, when prognostic models are constructed in small data sets.

Is Bootstrapping Sufficient for Validating a Risk Model for Selection of Participants for a Lung Cancer Screening Program?

  • M. MarcusJ. Field
  • Medicine
    Journal of clinical oncology : official journal of the American Society of Clinical Oncology
  • 2017
Risk predictionmodels are powerful tools that use multivariable regression to combine predictors or predisposing factors to estimate the probability or risk of the presence or future occurrence of

Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials

To be useful to clinicians, prognostic and diagnostic indices must be derived from accurate models developed by using appropriate data sets. We show that fractional polynomials, which extend ordinary



Regression modelling strategies for improved prognostic prediction.

A general index of predictive discrimination is used to measure the ability of a model developed on training samples of varying sizes to predict survival in an independent test sample of patients suspected of having coronary artery disease.

A bootstrap resampling procedure for model building: application to the Cox regression model.

A bootstrap-model selection procedure is developed, combining the bootstrap method with existing selection techniques such as stepwise methods, for the selection of variables in the framework of a regression model which might influence the outcome variable.

Applied Logistic Regression

Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.

Regression models in clinical studies: determining relationships between predictors and response.

This paper addresses the latter assumption of the distribution of the response variable by applying a direct and flexible approach, cubic spline functions, to two widely used models: the logistic regression model for binary responses and the Cox proportional hazards regression models for survival time data.

Measures of explained variation for survival data.

The importance of quantifying the predictive power of a prognostic model is discussed, and measures of explained variation as a possible quantification are suggested.

Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis

In an analysis of a large data set taken from clinical trials conducted by the Eastern Cooperative Oncology Group, these methods are seen to give useful insight into how prognosis varies as a function of continuous covariates, and also into how covariate effects change with follow-up time.

Bootstrap investigation of the stability of a Cox regression model.

A bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis shows graphically that these intervals are markedly wider than those obtained from the original model.

Prediction error and its estimation for subset-selected models

Strategies are compared for development of a linear regression model and the subsequent assessment of its predictive ability. Simulations were performed as a designed experiment over a range of data

Proportional hazards tests and diagnostics based on weighted residuals

SUMMARY Nonproportional hazards can often be expressed by extending the Cox model to include time varying coefficients; e.g., for a single covariate, the hazard function for subject i is modelled as