Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

@article{Goldstein2017OpportunitiesAC,
  title={Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review},
  author={Benjamin Alan Goldstein and Ann Marie Navar and Michael J. Pencina and John P. A. Ioannidis},
  journal={Journal of the American Medical Informatics Association},
  year={2017},
  volume={24},
  pages={198–208}
}
Objective: Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. [] Key Method Results: We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage…

Figures and Tables from this paper

Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review
TLDR
Improvement in the reporting of information necessary to enable external validation by other investigators is still urgently needed to increase clinical adoption of developed models.
Clinical Implementation of Predictive Models Embedded within Electronic Health Record Systems: A Systematic Review
TLDR
Overall, EHR-based predictive models offer promising results for improving clinical outcomes, although several gaps in the literature remain, and most study designs were observational.
Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods
TLDR
The objective was to review the literature and summarise existing approaches for harnessing repeated measurements of predictor variables in CPMs, primarily to make this field more accessible for applied researchers.
A comparison of risk prediction methods using repeated observations: an application to electronic health records for hemodialysis
TLDR
Using data from an EHR for patients undergoing hemodialysis, five different clinical predictors are incorporated into a risk model for patient mortality, suggesting that simple approaches perform just as well, if not better, than more complex analytic approaches.
Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance
TLDR
This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting, and quantifies prediction model performance in relation to data preparation choices when using electronic health records.
Reporting of demographic data and representativeness in machine learning models using electronic health records
TLDR
Whether studies developing ML models from EHR data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility is evaluated to ensure that ML models are deployed in an equitable and reproducible manner.
Methodological Challenges for Risk Prediction in Perinatal Epidemiology
TLDR
The accuracy and utility of prediction models for clinical decision making are contingent on the use of robust methods to develop risk prediction models and appropriate metrics to assess their performance and clinical impact and the era of big data provides researchers the opportunity to leverage existing databases.
Evaluation of Electronic Health Record-Based Suicide Risk Prediction Models on Contemporary Data
TLDR
Performance of the risk prediction models in this contemporary sample was similar to historical estimates for suicide attempt but modestly lower for suicide death.
Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk
TLDR
A dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol is developed and validated.
Adoption of clinical risk prediction tools is limited by a lack of integration with electronic health records
TLDR
A recent systematic review of clinical decision support systems by Kwan et al demonstrated only a poor to moderate improvement of care and highlighted the importance of designing models and tools that critically consider care processes and patient outcomes.
...
...

References

SHOWING 1-10 OF 142 REFERENCES
Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data
TLDR
Despite the EHR lacking some risk factors and its imperfect data quality, health care systems may be able to substantially improve risk prediction for their patients by using internally developed EHR-derived models and flexible statistical methodology.
Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting
TLDR
Widespread use of poor methods are found that could jeopardise model development, including univariate pre-screening of variables, categorisation of continuous risk predictors and poor handling of missing data.
Reporting performance of prognostic models in cancer: a review
TLDR
Many published prognostic models have been developed using poor methods and many with poor reporting, both of which compromise the reliability and clinical relevance of models, prognostic indices and risk groups derived from them.
Using Electronic Health Record Data to Develop and Validate a Prediction Model for Adverse Outcomes in the Wards*
TLDR
A prediction tool for ward patients that can simultaneously predict the risk of cardiac arrest and ICU transfer and was more accurate than the VitalPAC Early Warning Score and could be implemented in the electronic health record to alert caregivers with real-time information regarding patient deterioration.
Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches
TLDR
Heart failure was predicted more than 6 months before clinical diagnosis, with AUC of about 0.76, using logistic regression and Boosting, and SVM had the poorest performance, possibly because of imbalanced data.
Leveraging Derived Data Elements in Data Analytic Models for Understanding and Predicting Hospital Readmissions
TLDR
This work proposes developing predictive models by first generating derived variables that characterize clinical phenotype that reduces the number of variables, reduces noise, introduces clinical knowledge into model building, and abstracts away the underlying data representation, thus facilitating use of standard data mining algorithms.
Development and validation of a disease-specific risk adjustment system using automated clinical data.
TLDR
A small number of numerical laboratory results and administrative data provided excellent risk adjustment for inpatient mortality for a wide range of clinical conditions.
Diagnosis-specific readmission risk prediction using electronic health data: a retrospective cohort study
TLDR
The readmission-risk models for acute myocardial infarction and pneumonia validated well on a contemporary cohort, but not as well on an historical cohort, suggesting that models such as these need to be continuously trained and adjusted to respond to local trends.
Prospective EHR-Based Clinical Trials: The Challenge of Missing Data
This discussion focuses on the challenges of using prospectively collected electronic health record (EHR) data as outcomes in clinical trials, with a particular emphasis on the issue of missing data.
...
...