RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

@article{Kim2018RIDDLERA,
  title={RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
  author={Ji-Sung Kim and A. Rzhetsky},
  journal={PLoS Computational Biology},
  year={2018},
  volume={14}
}
Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic… 

Figures and Tables from this paper

Comparison of machine learning methods for clinical data imputation among a real-world lung cancer cohort
TLDR
ML imputation achieved promising performance for NSCLC patients within a large national cancer registry and achieved improved performance despite higher algorithm runtimes.
Inferring Personalized and Race-Specific Causal Effects of Genomic Aberrations on Gleason Scores: A Deep Latent Variable Model
TLDR
A joint deep latent variable model (DLVM) is proposed to in silico quantify the personalized and race-specific effects that a genomic aberration may exert on the Gleason Score (GS) of each individual PCa patient, and achieves much higher precision in causal effect inference.
Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data.
The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC's case-based
PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks
TLDR
This work proposes a novel graph convolutional network-based disease gene prioritization method, PGCN, through the systematic embedding of the heterogeneous network made by genes and diseases, as well as their individual features, and demonstrates that it has biological meaning and can capture functional groups of genes.
The quality of social determinants data in the electronic health record: a systematic review
TLDR
Consideration of data quality and evidence-based quality improvement methods may help prevent bias and improve the validity of research conducted with SDoH data.
Deep learning in bioinformatics: introduction, application, and perspective in big data era
TLDR
This review provides both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics, and introduces deep learning in an easy-to-understand fashion.
Deep learning in bioinformatics: introduction, application, and perspective in big data era
  • Yu Li
  • Computer Science
  • 2019
TLDR
This review provides both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics, and introduces deep learning in an easy-to-understand fashion.
Towards Structured Prediction in Bioinformatics with Deep Learning
  • Yu Li
  • Computer Science
    ArXiv
  • 2020
TLDR
This work argues that the following ideas can help resolve structured prediction problems in bioinformatics, and demonstrates how these ideas can combine with classic algorithms to design problem-specific deep learning architectures or methods.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Imputing Missing Race/Ethnicity in Pediatric Electronic Health Records: Reducing Bias with Use of U.S. Census Location and Surname Data.
TLDR
The new method reduces bias when race/ethnicity is partially, nonrandomly missing, and multiple imputation incorporating surname and address information reduced bias for both continuous and dichotomous outcomes.
A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity.
TLDR
The Bayesian Surname and Geocoding (BSG) method presented here efficiently integrates administrative data, substantially improving upon what is possible with a single source or from other hybrid methods; it offers a powerful tool that can help health care organizations address disparities until self-reported race/ethnicity data are available.
Mining electronic health records: towards better research applications and clinical care
TLDR
The potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality are considered.
Racial Composition Over the Life Course: Examining Separate and Unequal Environments and the Risk for Heart Disease for African American Men.
TLDR
Findings suggest exposure to segregated environments during childhood and later adulthood may impact hypertension risk among African American men over the life course.
Neighborhood Disadvantage, Poor Social Conditions, and Cardiovascular Disease Incidence Among African American Adults in the Jackson Heart Study.
TLDR
Worse neighborhood economic and social conditions may contribute to increased risk of CVD among African American women and policies directly addressing these issues may alleviate the burden ofCVD in this group.
The importance of race and ethnic background in biomedical research and clinical practice.
TLDR
With the completion of a rough draft of the human genome, some have suggested that racial classification may not be useful for biomedical studies, since it reflects “a fairly small number of genes that describe appearance” and “there is no basis in the genetic code for race.
The effect of race and sex on physicians' recommendations for cardiac catheterization.
TLDR
It is suggested that the race and sex of a patient independently influence how physicians manage chest pain.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
...
...