Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.

  title={Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.},
  author={Shilpa Nadimpalli Kobren and Brett K. Beaulieu-Jones and Christian Darabos and Dokyoon Kim and Anurag Verma},
  journal={Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing},
The following sections are included:IntroductionIdentifying patterns in clinical dataIntegration and analysis of molecular omics dataInterpretable and graph-based machine learning approachesComputational challenges and reproducibilityDiscussionReferences. 



Methods for the integration of multi-omics data: mathematical aspects

A review of the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects, to draw a more complete and accurate picture of the dynamics of molecular systems.

The NIH Big Data to Knowledge (BD2K) initiative

The articles that follow from seven of the Centers for Data Excellence that have been funded by the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative provide an overall context for the program.

Enhancing Model Interpretability and Accuracy for Disease Progression Prediction via Phenotype-BasedPatient Similarity Learning

This paper proposes to learn patient similarity features as phenotypes from the aggregated patient-medical service matrix using non-negative matrix factorization and shows that the phenotype-based similarity features can improve prediction over multiple baselines, including logistic regression, random forest, convolutional neural network, and more.

A Literature-Based Knowledge Graph Embedding Method for Identifying Drug Repurposing Opportunities in Rare Diseases

A knowledge graph embedding method is applied that explicitly models the uncertainty associated with literature-derived relationships and uses link prediction to generate drug repurposing hypotheses and is capable of generating novel repurpose hypotheses, which are independently validate using external literature sources and protein interaction networks.

Methods of integrating data to uncover genotype–phenotype interactions

The emerging approaches for data integration — including meta-dimensional and multi-staged analyses — which aim to deepen the understanding of the role of genetics and genomics in complex outcomes are explored.

PathFlowAI: A Convenient High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology

The preliminary data indicate that PathFlowAI may become a cost-effective and time-efficient tool for clinical use of Artificial Intelligence (AI) algorithms.

PTR Explorer: An approach to identify and explore Post Transcriptional Regulatory mechanisms usingproteogenomics

This work suggests that the proposed methodology has the potential for discovering and categorizing post-transcriptional regulatory mechanisms, manifesting in proteogenomic trends, which provide evidence for cancer specificity, miRNA targeting, and identification of regulation impacted by biological functionality and different types of degradation.

PAGE-Net: Interpretable and Integrative Deep Learning for Survival Analysis Using HistopathologicalImages and Genomic Data

A biologically interpretable deep learning model (PAGE-Net) that integrates histopathological images and genomic data, not only to improve survival prediction, but also to identify genetic and Histopathological patterns that cause different survival rates in patients.

Improving Survival Prediction Using a Novel Feature Selection and Feature Reduction FrameworkBased on the Integration of Clinical and Molecular Data

The development of a novel feature selection and feature reduction framework that can handle correlated data improves prognostic performance as compared to modeling approaches that do not consider combining non-redundant data.

CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs

CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis, is reported.