Comparative testing of DNA segmentation algorithms using benchmark simulations.

  title={Comparative testing of DNA segmentation algorithms using benchmark simulations.},
  author={Eran Elhaik and Dan Graur and Kre{\vs}imir Josi{\'c}},
  journal={Molecular biology and evolution},
  volume={27 5},
Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences… 

Figures and Tables from this paper

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm

By segmenting the human genome with IsoPlotter, it is found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domainsand relatively few long ones.

Multiscale DNA partitioning: statistical evidence for segments

This work focuses on partitioning with respect to GC content and proposes a new approach that provides statistical error control, which is based on a statistical multiscale criterion, rendering this as a segmentation method that searches segments of any length (on all scales) simultaneously.

Investigating genomic structure using changept: A Bayesian segmentation model

Weighted Consensus Segmentations

It is shown that for a large class of distance functions only breakpoints present in at least one input segmentation appear in the consensus segmentation, and a bound on the size of consensus segments is derived.

Bayesian hidden Markov models in DNA sequence segmentation using R: the case of Simian Vacuolating virus (SV40)

This paper aims to fully exploit R to fit a Bayesian HMM for DNA segmentation and concludes that the algorithms and functions in R can correctly estimate sequence segmentation if the HMM structure is assumed.

IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes

A completely automated pipeline, called IsoPlotter+, is designed to carry out all segmentation analyses, including graphical display, and a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes is built.

Segmenting the Human Genome into Isochores

This work presents a critical discussion of the currently available methods and a new approach called isoSegmenter which allows segmenting the genome into isochores in a fast and completely automatic manner and represents an improvement over the existing methods.

A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes

It is demonstrated that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum.

OcculterCut: A Comprehensive Survey of AT-Rich Regions in Fungal Genomes

A novel method to measure the local GC-content bias in genomes and a survey of published fungal species identified species containing distinct AT-rich regions, supporting the hypothesis that these regions play an important role in fungal evolution.

3 GC 3 Biology in Eukaryotes and Prokaryotes

There is a significant degeneracy of the genetic code so that the third base is less discriminatory for the amino acid than the other two bases, so this third position in the codon is referred to as the wobble position.



Comparing segmentations by applying randomization techniques

A framework for evaluating segmentation quality is introduced, and its use on two examples of segmental genomic structures is demonstrated.

A Bayesian approach to discriminate between alternative DNA sequence segmentations

An approximate Bayesian hypothesis test to discriminate between alternative candidate mosaic structures is devised and applied to various synthetic and real-world DNA sequence alignments.

SEGMENT: identifying compositional domains in DNA sequences

A heuristic segmentation algorithm for DNA sequences is described, which was implemented on a Windows program (SEGMENT), which divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance.

Delineating relative homogeneous G+C domains in DNA sequences.

New stopping criteria for segmenting DNA sequences.

  • W. Li
  • Biology
    Physical review letters
  • 2001
A solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns based on Bayesian information criterion in the model selection framework is proposed and a measure called segmentation strength is introduced which can be used to control the delineation of large domains.

Discovering isochores by least-squares optimal segmentation.

Compositional segmentation and long-range fractal correlations in DNA sequences.

A segmentation algorithm based on the Jensen-Shannon entropic divergence is used to decompose long-range correlated DNA sequences into statistically significant, compositionally homogeneous patches, demonstrating that neither the internal structure of patches nor the order in which these are arranged in the sequence is critical; therefore, long- range correlations in nucleotide sequences seem to rely only on the power-law distribution of patch lengths.

A Bayesian Approach to DNA Sequence Segmentation

A Bayesian method that identifies segments of similar structure by using a Markov chain governed by a hidden Markov model is described, applied to the segmentation of the bacteriophage lambda genome.

Assessment of compositional heterogeneity within and between eukaryotic genomes.

The findings indicate that the genomes of multicellular organisms are much more heterogeneous in nucleotide composition than depicted by the isochore model and so lead to a looser definition of isochores.

Base compositional structure of genomes.