Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

@article{Chin2013NonhybridFM,
  title={Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data},
  author={C. Chin and David H. Alexander and Patrick Marks and Aaron A. Klammer and James P Drake and Cheryl R. Heiner and Alicia Clum and Alex Copeland and John Huddleston and Evan E. Eichler and Stephen W. Turner and Jonas Korlach},
  journal={Nature Methods},
  year={2013},
  volume={10},
  pages={563-569}
}
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In… Expand
Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches
TLDR
Evaluation of the contemporary hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion. Expand
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.
TLDR
Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall, however, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms. Expand
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information
TLDR
The current work describes the SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences, and concludes that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run the program, allow to scaffold genomes in a fast and reliable manner. Expand
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.
TLDR
Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall, however, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms. Expand
Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations
TLDR
A novel multiplex strategy to make full use of the capacity and characteristics of SMRT sequencing in microbe genome assembly and showed that long-read genomic sequencing inherently provided the ability to assemble genomic sequencing data from multiple microbes into finished genomes due to its long length. Expand
Error correction and assembly complexity of single molecule sequencing reads
TLDR
A new data-driven model using support vector regression that can accurately predict assembly performance is developed and applied to several prokaryotic and eukaryotic genomes, and can achieve near-perfect assemblies of small genomes and substantially improved assemblies of larger ones. Expand
Reducing assembly complexity of microbial genomes with single-molecule sequencing
TLDR
Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. Expand
Single-molecule sequencing of the Drosophila serrata genome
TLDR
This work sequence and de novo assemble the genome of Drosophila serrata, a non-model species from the montium subgroup that has been well studied for clines and sexual selection and provides an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data. Expand
WENGAN: Efficient and high quality hybrid de novo assembly of human genomes
TLDR
The development of a novel algorithm for hybrid assembly, WENGAN, and the de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology are reported. Expand
Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms.
TLDR
This approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
Hybrid error correction and de novo assembly of single-molecule sequencing reads
TLDR
This work introduces a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences, leading to substantially better assemblies than current sequencing strategies. Expand
A hybrid approach for the automated finishing of bacterial genomes
TLDR
This work combines sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy and provides the next generation of rapid microbial identification and full-genome assembly. Expand
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
TLDR
Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome. Expand
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
TLDR
The algorithm and associated software tool, PBJelly, automates the finishing process using long sequence reads in a reference-guided assembly process and was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality. Expand
A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads
TLDR
A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented that identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. Expand
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
TLDR
The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads. Expand
Assembly complexity of prokaryotic genomes using short reads
TLDR
The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths and demonstrates that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. Expand
Improving genome assemblies by sequencing PCR products with PacBio.
TLDR
A genome improvement pipeline is developed after decreasing a loading bias against larger PCR products in the PacBio process that is not only cost-effective but also can close gaps greater than 2.5 kb in a single round of reactions, and sequence through high GC regions as well as difficult secondary structures such as small hairpin loops. Expand
Minimus: a fast, lightweight genome assembler
TLDR
The Minimus assembler is developed to address the challenges of large whole-genome sequencing projects and finds that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Expand
Finished bacterial genomes from shotgun sequence data.
TLDR
By applying a new laboratory design and new assembly algorithm to 16 samples, it is demonstrated that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Expand
...
1
2
3
4
5
...