Monitoring machine learning (ML)-based risk prediction algorithms in the presence of confounding medical interventions

  title={Monitoring machine learning (ML)-based risk prediction algorithms in the presence of confounding medical interventions},
  author={Jean Feng and Alexej Gossmann and Gene A. Pennello and Nicholas A. Petrick and Berkman Sahiner and Romain Pirracchio},
Monitoring the performance of machine learning (ML)-based risk prediction models in healthcare is complicated by the issue of confounding medical interventions (CMI): when an algorithm predicts a patient to be at high risk for an adverse event, clinicians are more likely to administer prophylactic treatment and alter the very target that the algorithm aims to predict. Ignoring CMI by monitoring only the untreated patients--whose outcomes remain unaltered--can inflate false alarm rates, because… 

Figures and Tables from this paper



Learning (predictive) risk scores in the presence of censoring due to interventions

A novel ranking based framework for disease severity score learning (DSSL), DSSL exploits the following key observation: while it is challenging for experts to quantify the disease severity at any given time, it is often easy to compare the Disease severity at two different times.

Model updating after interventions paradoxically introduces bias

It is shown that successive predictive scores may converge to a point where they predict their own effect, or may eventually oscillate between two values, and it is argued that neither outcome is desirable.

Learning to safely approve updates to machine learning algorithms

This work investigates designing approval policies for modifications to ML algorithms in the presence of distributional shifts and proposes a family of strategies that range in their level of optimism when approving modifications, to protect against settings where no version of the ML algorithm performs well.

Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees

BLR and MarBLR can improve the transportability of clinical prediction models and maintain their performance over time and are competitive with an oracle logistic reviser in terms of the average loss.

Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare

This work advocates for the creation of hospital units responsible for quality assurance and improvement of these algorithms, which it refers to as “AI-QI” units, and discusses how tools that have long been used in hospitalquality assurance and quality improvement can be adapted to monitor static ML algorithms.

Developing Predictive Models Using Electronic Medical Records: Challenges and Pitfalls

Key issues and subtle pitfalls specific to building predictive models from EMR are discussed, highlighting the importance of carefully considering both the special characteristics of EMR as well as the intended clinical use of the predictive model and showing that failure to do so could lead to developing models that are less useful in practice.

The Clinician and Dataset Shift in Artificial Intelligence.

This letter outlines how to identify, and potentially mitigate, common sources of “dataset shift” in machine-learning systems.

Risk-adjusted monitoring of time to event

Recently there has been interest in risk-adjusted cumulative sum charts, CUSUMs , to monitor the performance of e.g. hospitals, taking into account the heterogeneity of patients. Even though many