Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data

  title={Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data},
  author={Zepeng Huo and Xiaoning Qian and Shuai Huang and Zhangyang Wang and Bobak J. Mortazavi},
  booktitle={Machine Learning in Health Care},
Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature… 



Scalable and accurate deep learning with electronic health records

A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.

Impact of novel aggregation methods for flexible, time-sensitive EHR prediction without variable selection or cleaning

The authors' models outperform recent deep learning models for patient mortality classification using ICU timeseries, by embedding and aggregating all events with no pre-processing or variable selection, and can be easily combined with existing electronic health record systems for automated, dynamic patient risk analysis.

Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record

A computational framework, Patient2Vec, is proposed to learn an interpretable deep representation of longitudinal EHR data, which is personalized for each patient, and it achieves an area under curve around 0.799, outperforming baseline methods.

ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets

A new algorithm for ICU mortality prediction is presented that is designed to address the problem of imbalance, which occurs, in the context of binary classification, when one of the two classes is significantly under--represented in the data.

Addressing the Class Imbalance Problem in Medical Datasets

This paper examines the performance of over-sampled and under-sampling techniques to balance cardiovascular data and proposes an improved under sampling technique that displays significant better performance than the existing methods.

Sparse Gated Mixture-of-Experts to Separate and Interpret Patient Heterogeneity in EHR data

A Mixture-of-Experts (MoE) model is used and specifically couple it with a sparse gating network to handle patient heterogeneity for prediction and to aid interpretation of patient subtype separation and it is shown that with this sparsity the risk prediction can improve.

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

The findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.

CaliForest: calibrated random forest for health data

Evaluated CaliForest, a new calibrated random forest that can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics.

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.

Learning to Diagnose with LSTM Recurrent Neural Networks

This first study to empirically evaluate the ability of LSTMs to recognize patterns in multivariate time series of clinical measurements considers multilabel classification of diagnoses, and establishes the effectiveness of a simple LSTM network for modeling clinical data.