Predicting disease risks from highly imbalanced data using random forest

@inproceedings{Khalilia2011PredictingDR,
  title={Predicting disease risks from highly imbalanced data using random forest},
  author={Mohammad Khalilia and Sounak Chakraborty and Mihail Popescu},
  booktitle={BMC Med. Inf. & Decision Making},
  year={2011}
}
BACKGROUND We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare. METHODS We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project… CONTINUE READING