Learn More
Tumor samples are typically heterogeneous, containing admixture by normal, non-cancerous cells and one or more subpopulations of cancerous cells. Whole-genome sequencing of a tumor sample yields reads from this mixture, but does not directly reveal the cell of origin for each read. We introduce THetA (Tumor Heterogeneity Analysis), an algorithm that infers(More)
MOTIVATION Most tumor samples are a heterogeneous mixture of cells, including admixture by normal (non-cancerous) cells and subpopulations of cancerous cells with different complements of somatic aberrations. This intra-tumor heterogeneity complicates the analysis of somatic aberrations in DNA sequencing data from tumor samples. RESULTS We describe an(More)
The multinomial model that we use in our likelihood function does not assume that the observed read depths in different intervals are independent. Even though we assume that reads are distributed uniformly on the cancer genome, large copy number aberrations (e.g. gain and loss of whole chromosomes) will cause the observed number of aligned reads in an(More)
High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational(More)
BACKGROUND When biological networks are studied, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To understand the biological meaning of a cluster, the user usually has to sift through many textual annotations that are associated with biological entities. FINDINGS The WordCloud Cytoscape plugin generates a visual(More)
MOTIVATION DNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor. RESULTS We formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the variant allele frequency (VAF) factorization problem. We(More)
A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for(More)
Phylogenetic techniques are increasingly applied to infer the somatic mutational history of a tumor from DNA sequencing data. However, standard phylogenetic tree reconstruction techniques do not account for the fact that bulk sequencing data measures mutations in a population of cells. We formulate and solve the multi-state perfect phylogeny mixture(More)
The evolution of a cancer genome has traditionally been described as a sequential accumulation of mutations - including chromosomal rearrangements - over a period of time. Recent research suggests, however, that numerous rearrangements may be acquired simultaneously during a single cataclysmic event, leading to the proposal of new mechanisms of(More)
The reconstruction of phylogenetic trees from mixed populations has become important in the study of cancer evolution, as sequencing is often performed on bulk tumor tissue containing mixed populations of cells. Recent work has shown how to reconstruct a perfect phylogeny tree from samples that contain mixtures of two-state characters, where each(More)