Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science)

@article{Zeng2018ReviewOS,
  title={Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science)},
  author={Irene Sui Lan Zeng and Thomas Lumley},
  journal={Bioinformatics and Biology Insights},
  year={2018},
  volume={12}
}
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning… 

Figures from this paper

A Selective Review of Multi-Level Omics Data Integration Using Variable Selection
TLDR
This article focuses on reviewing existing multi-omics integration studies by paying special attention to variable selection methods, and reviews existing supervised, semi-supervised and unsupervised integrative analyses within parallel and hierarchical integration studies, respectively.
A U-statistics for integrative analysis of multilayer omics data
TLDR
A U-statistics-based non-parametric framework for the association analysis of multi-layer omics data, where consensus and permutation-based weighting schemes are developed to account for various types of disease models.
Editorial: Artificial Intelligence Bioinformatics: Development and Application of Tools for Omics and Inter-Omics Studies
TLDR
The Research Topic presents articles on novel developments in the field of artificial intelligence in biology and medicine, and their applications in the analysis of high-throughput data from omics and inter-omics approaches.
DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays
TLDR
DIABLO is a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups, while achieving predictive performance comparable to state-of-the-art supervised approaches.
Machine and deep learning meet genome-scale metabolic modeling
TLDR
How machine learning and constraint-based modeling can be combined is described, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved, as well as overlapping systematic classifications from both frameworks.
Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data
TLDR
This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value.
Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities
TLDR
This review overviews some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data.
Integrative Network Fusion: a multi-omics approach in molecular profiling
TLDR
The Integrative Network Fusion (INF) framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.
Exploiting Interdata Relationships in Next-generation Proteomics Analysis*
TLDR
It is argued that productive data integration differs from parallel acquisition and interpretation and should move toward quantitative modeling of the relationships between the data.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Multivariate Methods for the Integration and Visualization of Omics Data
TLDR
This work discusses the application of multivariate statistical approaches to integrate bio-molecular information by using multiple factorial analysis and shows how these statistical techniques can be used to perform reduction dimension and then visualize data of one type useful to explain those from other types.
Integrating Heterogeneous omics Data via Statistical Inference and Learning Techniques
TLDR
This review highlights recent statistical inference and learning techniques that have been devised in this context of multi-omics studies, and asks, in which way integrated omics data could be used for better personalized patient treatment in a supervised as well as unsupervised learning setting.
Network-based analysis of omics with multi-objective optimization.
TLDR
A new method to generate networks of biological components that incorporate multi-omics information is developed, which relies on using a multi-objective (MO) optimization procedure to drive the identification of networks that are enriched according to several statistical estimators.
Multi-omics enrichment analysis using the GeneTrail2 web service
TLDR
The presented use-case demonstrates that GeneTrail2 is well equipped for the integrative analysis of comprehensive -omics data and may help to shed light on complex pathogenic mechanisms in cancer and other diseases.
MVDA: a multi-view genomic data integration methodology
TLDR
A multi-view approach in which the information from different data layers (views) is integrated at the levels of the results of each single view clustering iterations, which suggests that integration of prior information with genomic features in the subtyping analysis is an effective strategy in identifying disease subgroups.
Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient
TLDR
The adjusted RV is introduced, which is unbiased in the case of independent data sets and a better estimator than previously RV versions in terms of the mean square error and the power of the independence test based on it.
MGV: a generic graph viewer for comparative omics data
TLDR
MGV is presented, a versatile generic graph viewer for multiomics data that extends Mayday's visual analytics capabilities by integrating a wide range of biological models, high-throughput data and meta information to display enriched graphs that combine data and models.
Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics
TLDR
A technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier is proposed, and it is demonstrated that the data integration strategy can improve classification performance compared to using a single data source only.
A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data
TLDR
A novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization is introduced and an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings is provided.
...
...