Toward a statistically explicit understanding of de novo sequence assembly
@article{Howison2013TowardAS,
title={Toward a statistically explicit understanding of de novo sequence assembly},
author={Mark Howison and Felipe Zapata and Casey W. Dunn},
journal={Bioinformatics},
year={2013},
volume={29 23},
pages={
2959-63
}
}MOTIVATION
Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copies of the genome in the sample, errors and ambiguities in the sequenced data and assumptions and…
29 Citations
Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
- BusinessPloS one
- 2014
The posterior distribution of assembly hypotheses generated by GABI as a majority-rule consensus assembly is summarized, and the posterior distribution to external assemblies of the same test data is compared, and annotate those assemblies by assigning posterior probabilities to features that are in common with GABI's assembly graph.
Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies
- BiologyPLoS Comput. Biol.
- 2014
The magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families, and the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, are investigated.
Automated ensemble assembly and validation of microbial genomes
- Biology
- 2014
Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
Automated ensemble assembly and validation of microbial genomes
- BiologyBMC Bioinformatics
- 2014
Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
Assembly and Data Quality
- Biology
- 2017
Methods to assemble sequence reads into larger pieces are described, and different strategies are used for genome, transcriptome and metagenome assemblies, and all of them greatly benefit from the inclusion of long reads.
GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments
- BiologyBioinform.
- 2015
GMcloser is described, a tool that accurately closes gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads) by using likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolding, thereby achieving accurate and efficient gap closure.
Evaluation of de novo transcriptome assemblies from RNA-Seq data
- BiologyGenome Biology
- 2014
A model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown is developed and shown to correctly reflect assembly accuracy, as measured by REF- EVAL, a refined set of ground-truth-based scores that were developed.
Phylogenomics from Whole Genome Sequences Using aTRAM
- BiologySystematic biology
- 2017
The use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single‐copy ortholog genes from whole genome sequencing of sucking lice and out‐groups is demonstrated and it is demonstrated that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads.
ILP-based maximum likelihood genome scaffolding
- BiologyBMC Bioinformatics
- 2014
Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds, and the ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples.
Evaluation of de novo transcriptome assemblies from RNA-Seq data
- Biology
- 2014
This work developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown, and assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly.
References
SHOWING 1-10 OF 39 REFERENCES
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
- BiologyGenome research
- 2012
Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
- BiologyGigaScience
- 2013
The high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
Efficient de novo assembly of large genomes using compressed data structures.
- Computer Science, BiologyGenome research
- 2012
A new assembler based on the overlap-based string graph model of assembly, SGA (String Graph Assembler), which provides the first practical assembler for a mammalian-sized genome on a low-end computing cluster and is simply parallelizable.
ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies
- BiologyBioinform.
- 2013
The ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.
Assembly reconciliation
- BiologyBioinform.
- 2008
Using the Assembly Reconciliation technique, the produced reconciled assemblies of six Drosophila species in collaboration with Agencourt Bioscience and The J. Craig Venter Institute are now the official (CAF1) assemblies used for analysis.
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
- BiologyGenome research
- 2011
The Assemblathon 1 competition is described, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies, and it is established that it is possible to assemble the genome to a high level of coverage and accuracy.
Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data
- BiologyBriefings Bioinform.
- 2012
This work reviews the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content, and investigates their strengths and weaknesses in the reconstruction of the reference genomes.
An improved maximum likelihood formulation for accurate genome assembly
- Engineering2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)
- 2011
Improvements to the recently proposed maximum likelihood method for genome assembly are presented, and results indicate that the method can generate accurate estimates of repeat counts and produces fewer and much longer contigs.
Maximum Likelihood Genome Assembly
- BiologyJ. Comput. Biol.
- 2009
It is demonstrated how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly and a maximum likelihood framework for assembling the genome that is the most likely source of the reads is proposed.
REAPR: a universal tool for genome assembly evaluation
- BiologyGenome Biology
- 2013
This work validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrated that 86% and 82% of the human and mouse reference genomes are error-free, respectively.