An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples

@article{Yadav2015AnAO,
  title={An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples},
  author={Vinod Kumar Yadav and Subhajyoti De},
  journal={Briefings in bioinformatics},
  year={2015},
  volume={16 2},
  pages={
          232-41
        }
}
Solid tumor samples typically contain multiple distinct clonal populations of cancer cells, and also stromal and immune cell contamination. A majority of the cancer genomics and transcriptomics studies do not explicitly consider genetic heterogeneity and impurity, and draw inferences based on mixed populations of cells. Deconvolution of genomic data from heterogeneous samples provides a powerful tool to address this limitation. We discuss several computational tools, which enable deconvolution… 

Figures and Tables from this paper

Systematic Assessment of Tumor Purity and Its Clinical Implications

TLDR
The data show poor concordance between pathologic and molecular purity estimates, necessitating caution when interpreting molecular results, and highlight the need for improved assessment of tumor purity and quantitation of its influences on the molecular hallmarks of cancers.

Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data

TLDR
The improved THetA2 algorithm is substantially faster and identifies numerous tumor samples containing subclonal populations in the TCGA data, including in one highly rearranged sample for which other tumor purity estimation algorithms were unable to estimate tumor purity.

Tumor purity quantification by clonal DNA methylation signatures

TLDR
PAMES is a valuable tool to assess the purity of tumor samples in the settings of clinical research and diagnostics and its evaluation in a cancer cell line dataset highlights its reliability to accurately estimate tumor admixtures.

Assessing reliability of intra-tumor heterogeneity estimates from single sample whole exome sequencing data

TLDR
The prognostic value of tumor heterogeneity for survival prediction is limited in datasets, and there is no evidence that it improves over prognosis based on other clinical variables, so heterogeneity inference from WES data on a single sample should be considered with caution.

All-FIT: Allele-Frequency-based Imputation of Tumor Purity from High-Depth Sequencing Data

TLDR
All-FIT is an iterative weighted least square method to estimate specimen tumor purity based on the allele frequencies of variants detected in high-depth, targeted, clinical sequencing data and its accuracy and improved performance against leading computational approaches are demonstrated.

Exploring the spatiotemporal genetic heterogeneity in metastatic lung adenocarcinoma using a nuclei flow‐sorting approach

TLDR
The results of this study provide evidence that most macroevolutionary events occur in primary tumors before metastatic dissemination and advocate for a limited degree of CIN over time and space in this cohort of LUADs.

Computational deconvolution of transcriptomics data from mixed cell populations

TLDR
This review highlights the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting.

Integrated transcriptomic–genomic tool Texomer profiles cancer tissues

TLDR
To address the issue of intra-tissue heterogeneity in cancer genomics, Texomer is developed, which enables joint analysis of bulk DNA and RNA sequencing data for allele-specific deconvolution and quantification of tumor heterogeneity.

BubbleTree: an intuitive visualization to elucidate tumoral aneuploidy and clonality using next generation sequencing data

TLDR
The performance of BubbleTree was demonstrated with comparisons to similar commonly used tools such as THetA2, ABSOLUTE, AbsCN-seq and ASCAT, and BubbleTree outperformed these tools, particularly in identifying tumor subclonal populations and polyploidy.

De novo compartment deconvolution and weight estimation of tumor samples (DECODER)

TLDR
DECODER, an integrated framework which performs de novo deconvolution, and compartment weight estimation for a single sample is developed and it is demonstrated that it can be utilized to reproducibly estimate cellular compartment weights in pancreatic cancer that are clinically meaningful.
...

References

SHOWING 1-10 OF 57 REFERENCES

Inferring tumour purity and stromal and immune cell admixture from expression data

TLDR
A method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples and prediction accuracy is corroborated using 3,809 transcriptional profiles available elsewhere in the public domain.

THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data

TLDR
THetA (Tumor Heterogeneity Analysis), an algorithm that infers the most likely collection of genomes and their proportions in a sample, for the case where copy number aberrations distinguish subpopulations, is introduced.

PurityEst: estimating purity of human tumor samples using next-generation sequencing data

TLDR
A novel algorithm, PurityEst, is developed, to infer the tumor purity level from the allelic differential representation of heterozygous loci with somatic mutations in a human tumor sample with a matched normal tissue using next-generation sequencing data.

TrAp: a tree approach for fingerprinting subclonal tumor composition

TLDR
An evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor is proposed.

Gene expression deconvolution in clinical samples

TLDR
Recent in silico methods for deconvoluting a gene expression profile into cell-type-specific subprofiles and the experimental validations available for them are considered.

Future medical applications of single-cell sequencing in cancer

TLDR
The challenges and technical aspects of single-cell sequencing are discussed, with a strong focus on genomic copy number, and how this information can be used to diagnose and treat cancer patients are discussed.

Absolute quantification of somatic DNA alterations in human cancer

TLDR
A computational method that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations is described, revealing that genome-doubling events are common in human cancer, likely occur in cells that are already aneuploid, and influence pathways of tumor progression.

In silico microdissection of microarray data from heterogeneous cell populations

TLDR
A computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico and an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples are proposed.

Tumour evolution inferred by single-cell sequencing

TLDR
It is shown that with flow-sorted nuclei, whole genome amplification and next generation sequencing the authors can accurately quantify genomic copy number within an individual nucleus and indicate that tumours grow by punctuated clonal expansions with few persistent intermediates.

Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples

TLDR
An approach that builds upon a linear latent variable model, in which expression levels from mixed cell populations are modeled as the weighted average of expression from different cell types, which efficiently identifies the globally optimal solution while preserving non-negativity of the fraction of the cells is described.
...