Corpus ID: 220265695

A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records

  title={A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records},
  author={Leopold Franz and Yash Raj Shrestha and Bibek Paudel},
Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as those having multi-morbidity conditions. In order to build a useful disease diagnosis prediction… Expand

Figures and Tables from this paper

Implementation and Use of Disease Diagnosis Systems for Electronic Medical Records Based on Machine Learning: A Complete Review
This survey paper is to highlight both the strong and weak points of various proposed techniques in the disease diagnosis, which are categorized into Rule-Based Methods, Machine Learning (ML) Methods, and Deep Learning (DL) Methods. Expand
Augmenting Organizational Decision-Making with Deep Learning Algorithms: Principles, Promises, and Challenges
This work conceptualizes the decision-making process in organizations augmented with DL algorithm outcomes (such as predictions or robust patterns from unstructured data) as deep learning–augmented decision- making (DLADM). Expand
Extractive Summarization for Explainable Sentiment Analysis using Transformers
Two different methodologies are proposed to exploit the performance of these models in a task of sentiment analysis and, in the meantime, to generate a summary that serves as an explanation of the decision taken by the system. Expand
An Explainable Artificial Intelligence Approach for Predicting Cardiovascular Outcomes using Electronic Health Records
A recently developed massively scalable comorbidity discovery method called Poisson Binomial based Comorbidities discovery (PBC) is deployed, to analyze Electronic Health Records from the University of Utah and Primary Children's Hospital for comor bid diagnoses, procedures, and medications. Expand
Are we there yet? Exploring clinical domain knowledge of BERT models
The task of unsupervised text retrieval to bridge the gap in existing information to facilitate inference is more complex than what the state-of-the-art methods can solve, and warrants extensive research in the future. Expand
Multi-Class Text Classification Using Machine Learning Models for Online Drug Reviews
The reviews that are present in different forms on the Internet can provide valuable insights into the opinions of the users that are spread across a wide range of geographical space in the most timeExpand
Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case
This paper proposes a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching, and achieves a new SOTA for the unsupervised phenotype concept annotation on clinical text on F1 and Recall outperforming the previous SOTA. Expand
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics
With the recent expanding attention of machine learning researchers and 1 practitioners to fairness, there is a void of a common framework to analyze and 2 compare the capabilities of proposed modelsExpand


Scalable and accurate deep learning with electronic health records
A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. Expand
Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records
The findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems. Expand
MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare
Multilevel Medical Embedding (MiME) is proposed which learns the multilevel embedding of EHR data while jointly performing auxiliary prediction tasks that rely on this inherent EHR structure without the need for external labels. Expand
Multimodal Machine Learning for Automated ICD Coding
Two separate machine learning models that can handle data from different modalities, including unstructured text, semi-structuring text and structured tabular data are developed and an ensemble method to integrate all modality-specific models to generate ICD-10 codes is employed. Expand
Neural networks versus Logistic regression for 30 days all-cause readmission prediction
It is concluded that data from patient timelines improve 30 day readmission prediction, that a logistic regression with LASSO has equal performance to the best neural network model and that the use of administrative data result in competitive performance compared to published approaches based on richer clinical datasets. Expand
Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning
A multi-task learning framework for BioNER is proposed to collectively use the training data of different types of entities and improve the performance on each of them, achieving substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Expand
Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes
This work proposes a pretrained hierarchical recurrent neural network model that parses minimally processed clinical notes in an intuitive fashion, and shows that it improves performance for discharge diagnosis classification tasks on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Expand
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
It is found that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes, and neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex. Expand
Clinical Concept Embeddings Learned from Massive Sources of Medical Data
This article benchmarks two of the most popular algorithms for word embeddings, GloVe and word2vec, to assess their suitability for capturing medical relationships in large sources of biomedical data and provides a unified view of these algorithms. Expand
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data
This article demonstrates how an insurance claims database of 60 million members, a collection of clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. Expand