# Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

@article{Harrell1996MultivariablePM, title={Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.}, author={Frank E. Harrell and K L Lee and Daniel B. Mark}, journal={Statistics in medicine}, year={1996}, volume={15 4}, pages={ 361-87 } }

Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly…

## 5,608 Citations

### Validation measures for prognostic models for independent and correlated binary and survival outcomes

- Psychology
- 2012

Existing validation measures for independent data, such as the C-index, D statistic, calibration slope, Brier score, and the K statistic for use with random effects/frailty models are extended.

### An evaluation of penalised survival methods for developing prognostic models with rare events

- Environmental ScienceStatistics in medicine
- 2012

Three existing penalised methods that have been proposed to improve predictive accuracy, including ridge, lasso and the garotte, are evaluated using simulated data derived from two clinical datasets and suggest that significant improvements are possible by taking a penalised modelling approach.

### Prognostic Modeling with Logistic Regression Analysis

- BiologyMedical decision making : an international journal of the Society for Medical Decision Making
- 2001

A sensible strategy in small data sets is to apply shrinkage methods in full models that include well-coded predictors that are selected based on external information, such as full models including all available covariables.

### Measures of discrimination and predictive accuracy for interval censored survival data

- Psychology
- 2015

Medical researchers frequently make statements that one model predicts survival better than another, and are frequently challenged to provide rigorous statistical justification for these statements.…

### Several methods to assess improvement in risk prediction models: Extension to survival analysis

- Environmental ScienceStatistics in medicine
- 2011

The primary parameters considered are net reclassification improvement (NRI) and integrated discrimination improvement (IDI) and a primary measure of concordance, area under the ROC curve (AUC), also called the c-statistic.

### Risk assessment with newer statistical metrics

- MedicineStatistics in medicine
- 2017

The first Pencina article in this issue presents to assess the impact of calibration on the newer metrics of model performance, including Area under the Curve (AUC), discrimination slope, R-model, and R-residuals.

### Assessing calibration of prognostic risk scores

- Computer ScienceStatistical methods in medical research
- 2016

A model-based framework for the assessment of calibration in the binary setting that provides natural extensions to the survival data setting and it is shown that Poisson regression models can be used to easily assess calibration in prognostic models.

### Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.

- MathematicsStatistics in medicine
- 2000

It is found that stepwise selection with a low alpha led to a relatively poor model performance, when evaluated on independent data, and shrinkage methods in full models including prespecified predictors and incorporation of external information are recommended, when prognostic models are constructed in small data sets.

### Is Bootstrapping Sufficient for Validating a Risk Model for Selection of Participants for a Lung Cancer Screening Program?

- MedicineJournal of clinical oncology : official journal of the American Society of Clinical Oncology
- 2017

Risk predictionmodels are powerful tools that use multivariable regression to combine predictors or predisposing factors to estimate the probability or risk of the presence or future occurrence of…

### Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials

- Mathematics
- 1999

To be useful to clinicians, prognostic and diagnostic indices must be derived from accurate models developed by using appropriate data sets. We show that fractional polynomials, which extend ordinary…

## References

SHOWING 1-10 OF 82 REFERENCES

### Regression modelling strategies for improved prognostic prediction.

- PsychologyStatistics in medicine
- 1984

A general index of predictive discrimination is used to measure the ability of a model developed on training samples of varying sizes to predict survival in an independent test sample of patients suspected of having coronary artery disease.

### A bootstrap resampling procedure for model building: application to the Cox regression model.

- MathematicsStatistics in medicine
- 1992

A bootstrap-model selection procedure is developed, combining the bootstrap method with existing selection techniques such as stepwise methods, for the selection of variables in the framework of a regression model which might influence the outcome variable.

### Applied Logistic Regression

- Psychology
- 1989

Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.

### Regression models in clinical studies: determining relationships between predictors and response.

- MathematicsJournal of the National Cancer Institute
- 1988

This paper addresses the latter assumption of the distribution of the response variable by applying a direct and flexible approach, cubic spline functions, to two widely used models: the logistic regression model for binary responses and the Cox proportional hazards regression models for survival time data.

### Measures of explained variation for survival data.

- BusinessStatistics in medicine
- 1990

The importance of quantifying the predictive power of a prognostic model is discussed, and measures of explained variation as a possible quantification are suggested.

### Predicting outcome in coronary disease. Statistical models versus expert clinicians.

- MedicineThe American journal of medicine
- 1986

### Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis

- Mathematics
- 1992

In an analysis of a large data set taken from clinical trials conducted by the Eastern Cooperative Oncology Group, these methods are seen to give useful insight into how prognosis varies as a function of continuous covariates, and also into how covariate effects change with follow-up time.

### Bootstrap investigation of the stability of a Cox regression model.

- MathematicsStatistics in medicine
- 1989

A bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis shows graphically that these intervals are markedly wider than those obtained from the original model.

### Prediction error and its estimation for subset-selected models

- Psychology
- 1991

Strategies are compared for development of a linear regression model and the subsequent assessment of its predictive ability. Simulations were performed as a designed experiment over a range of data…

### Proportional hazards tests and diagnostics based on weighted residuals

- Mathematics
- 1994

SUMMARY Nonproportional hazards can often be expressed by extending the Cox model to include time varying coefficients; e.g., for a single covariate, the hazard function for subject i is modelled as…