• Corpus ID: 119182080

Multiple kernel learning for integrative consensus clustering of genomic datasets

@article{Cabassi2019MultipleKL,
  title={Multiple kernel learning for integrative consensus clustering of genomic datasets},
  author={Alessandra Cabassi and Paul D. W. Kirk},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.07701}
}
Summary: Diverse applications – particularly in tumour subtyping – have demonstrated the importance of integrative clustering as a means to combine information from multiple high-dimensional omics datasets. Cluster-Of-Clusters Analysis (COCA) is a popular integrative clustering method that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and the robustness of this approach to the inclusion of noisy datasets, or… 

Figures from this paper

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution
TLDR
GPseudoClust is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both, and combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation.
GPseudoClust: deconvolution of shared pseudo-trajectories at single-cell resolution
TLDR
GPseudoClust combines a recent method for pseudo-time inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies and categorises genes in a way consistent with known biological function.

References

SHOWING 1-10 OF 35 REFERENCES
Bayesian correlated clustering to integrate multiple datasets
TLDR
Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
TLDR
A probabilistic clustering method to identify groups across datasets that do not share the same cluster structure, and the proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets.
Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis
TLDR
The resulting methodology iCluster incorporates flexible modeling of the associations between different data types and the variance-covariance structure within data types in a single framework, while simultaneously reducing the dimensionality of the datasets.
A statistical framework for genomic data fusion
TLDR
This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins.
Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology
TLDR
A novel multiple kernel learning algorithm is proposed that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data.
Bayesian consensus clustering
TLDR
A computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings is described and demonstrated that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently.
SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS.
TLDR
This study uses penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes in breast and lung cancer data sets.
Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure
TLDR
This work empirically investigate the performance of the SVM on the task of inferring gene functional annotations from a combination of protein sequence and structure data and suggests that for many applications, a naive unweighted sum of kernels may be sufficient.
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data
TLDR
A new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data is presented, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters.
Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome
TLDR
Increasing knowledge of the heterogeneity of the luminal A subtype may add pivotal information to guide therapeutic choices, evidently bringing us closer to improved treatment for this largest subgroup of breast cancer.
...
...