Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.

  title={Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.},
  author={S. N. Kobren and Brett K. Beaulieu-Jones and Christian Darabos and Dokyoon Kim and A. Verma},
  journal={Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing},
The following sections are included:IntroductionIdentifying patterns in clinical dataIntegration and analysis of molecular omics dataInterpretable and graph-based machine learning approachesComputational challenges and reproducibilityDiscussionReferences. 


Methods for the integration of multi-omics data: mathematical aspects
A review of the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects, to draw a more complete and accurate picture of the dynamics of molecular systems. Expand
The NIH Big Data to Knowledge (BD2K) initiative
The articles that follow from seven of the Centers for Data Excellence that have been funded by the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative provide an overall context for the program. Expand
Enhancing Model Interpretability and Accuracy for Disease Progression Prediction via Phenotype-BasedPatient Similarity Learning
This paper proposes to learn patient similarity features as phenotypes from the aggregated patient-medical service matrix using non-negative matrix factorization and shows that the phenotype-based similarity features can improve prediction over multiple baselines, including logistic regression, random forest, convolutional neural network, and more. Expand
A Literature-Based Knowledge Graph Embedding Method for Identifying Drug Repurposing Opportunities in Rare Diseases
A knowledge graph embedding method is applied that explicitly models the uncertainty associated with literature-derived relationships and uses link prediction to generate drug repurposing hypotheses and is capable of generating novel repurpose hypotheses, which are independently validate using external literature sources and protein interaction networks. Expand
Methods of integrating data to uncover genotype–phenotype interactions
The emerging approaches for data integration — including meta-dimensional and multi-staged analyses — which aim to deepen the understanding of the role of genetics and genomics in complex outcomes are explored. Expand
PathFlowAI: A Convenient High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology
The preliminary data indicate that PathFlowAI may become a cost-effective and time-efficient tool for clinical use of Artificial Intelligence (AI) algorithms. Expand
PTR Explorer: An approach to identify and explore Post Transcriptional Regulatory mechanisms usingproteogenomics
This work suggests that the proposed methodology has the potential for discovering and categorizing post-transcriptional regulatory mechanisms, manifesting in proteogenomic trends, which provide evidence for cancer specificity, miRNA targeting, and identification of regulation impacted by biological functionality and different types of degradation. Expand
PAGE-Net: Interpretable and Integrative Deep Learning for Survival Analysis Using HistopathologicalImages and Genomic Data
A biologically interpretable deep learning model (PAGE-Net) that integrates histopathological images and genomic data, not only to improve survival prediction, but also to identify genetic and Histopathological patterns that cause different survival rates in patients. Expand
Improving Survival Prediction Using a Novel Feature Selection and Feature Reduction FrameworkBased on the Integration of Clinical and Molecular Data
The development of a novel feature selection and feature reduction framework that can handle correlated data improves prognostic performance as compared to modeling approaches that do not consider combining non-redundant data. Expand
Assessment of Imputation Methods for Missing Gene Expression Data in Meta-Analysis of DistinctCohorts of Tuberculosis Patients
The results indicate that truncating to common genes observed across cohorts, which is the current method used by researchers, results in the exclusion of important biology and suggest that LASSO and LLS imputation methodologies can reasonably impute genes across cohorts when total missingness rates are below 20%. Expand