Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.

@article{Kobren2020OngoingCA,
  title={Ongoing challenges and innovative approaches for recognizing patterns across large-scale, integrative biomedical datasets.},
  author={Shilpa Nadimpalli Kobren and Brett K. Beaulieu-Jones and Christian Darabos and Dokyoon Kim and Anurag Verma},
  journal={Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing},
  year={2020},
  volume={25},
  pages={
          286-294
        }
}
The following sections are included:IntroductionIdentifying patterns in clinical dataIntegration and analysis of molecular omics dataInterpretable and graph-based machine learning approachesComputational challenges and reproducibilityDiscussionReferences. 

References

SHOWING 1-10 OF 28 REFERENCES
Methods for the integration of multi-omics data: mathematical aspects
TLDR
A review of the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects, to draw a more complete and accurate picture of the dynamics of molecular systems.
Enhancing Model Interpretability and Accuracy for Disease Progression Prediction via Phenotype-BasedPatient Similarity Learning
TLDR
This paper proposes to learn patient similarity features as phenotypes from the aggregated patient-medical service matrix using non-negative matrix factorization and shows that the phenotype-based similarity features can improve prediction over multiple baselines, including logistic regression, random forest, convolutional neural network, and more.
A Literature-Based Knowledge Graph Embedding Method for Identifying Drug Repurposing Opportunities in Rare Diseases
TLDR
A knowledge graph embedding method is applied that explicitly models the uncertainty associated with literature-derived relationships and uses link prediction to generate drug repurposing hypotheses and is capable of generating novel repurpose hypotheses, which are independently validate using external literature sources and protein interaction networks.
Methods of integrating data to uncover genotype–phenotype interactions
TLDR
The emerging approaches for data integration — including meta-dimensional and multi-staged analyses — which aim to deepen the understanding of the role of genetics and genomics in complex outcomes are explored.
PathFlowAI: A Convenient High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology
TLDR
The preliminary data indicate that PathFlowAI may become a cost-effective and time-efficient tool for clinical use of Artificial Intelligence (AI) algorithms.
Assessment of Imputation Methods for Missing Gene Expression Data in Meta-Analysis of DistinctCohorts of Tuberculosis Patients
TLDR
The results indicate that truncating to common genes observed across cohorts, which is the current method used by researchers, results in the exclusion of important biology and suggest that LASSO and LLS imputation methodologies can reasonably impute genes across cohorts when total missingness rates are below 20%.
Bayesian semi-nonnegative matrix tri-factorization to identify pathways associated with cancer phenotypes
TLDR
A Bayesian semi-nonnegative matrix trifactorization method to identify pathways associated with cancer phenotypes from a realvalued input matrix and it is shown that those pathways identified could be used as prognostic biomarkers to stratify patients with distinct survival outcome in two independent validation datasets.
CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs
TLDR
CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis, is reported.
Using Transcriptional Signatures to Find Cancer Drivers with LURE
TLDR
A semi-supervised method called Learning UnRealized Events (LURE) is introduced that uses a progressive label learning framework and minimum spanning analysis to predict cancer drivers based on their altered samples sharing a gene expression signature with the samples of a known event.
Learning a Latent Space of Highly Multidimensional Cancer Data
TLDR
It is demonstrated that UFDN-TCGA learns a biologically relevant, low-dimensional latent space of high-dimensional gene expression data by applying the authors' network to two classification tasks of cancer status and cancer type and performs comparably to random forest methods.
...
...