Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

@article{Chin2013NonhybridFM,
  title={Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data},
  author={C. Chin and David H. Alexander and Patrick J. Marks and Aaron A. Klammer and James P Drake and Cheryl R. Heiner and Alicia Clum and Alex Copeland and John Huddleston and Evan E. Eichler and Stephen W. Turner and Jonas Korlach},
  journal={Nature Methods},
  year={2013},
  volume={10},
  pages={563-569}
}
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In… 

Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches

TLDR
Evaluation of the contemporary hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.

Benchmarking of long-read assemblers for prokaryote whole genome sequencing.

TLDR
Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall, however, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

TLDR
The current work describes the SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences, and concludes that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run the program, allow to scaffold genomes in a fast and reliable manner.

Benchmarking of long-read assemblers for prokaryote whole genome sequencing.

TLDR
Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall, however, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations

TLDR
A novel multiplex strategy to make full use of the capacity and characteristics of SMRT sequencing in microbe genome assembly and showed that long-read genomic sequencing inherently provided the ability to assemble genomic sequencing data from multiple microbes into finished genomes due to its long length.

Reducing assembly complexity of microbial genomes with single-molecule sequencing

TLDR
Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower.

Error correction and assembly complexity of single molecule sequencing reads

TLDR
A new data-driven model using support vector regression that can accurately predict assembly performance is developed and applied to several prokaryotic and eukaryotic genomes, and can achieve near-perfect assemblies of small genomes and substantially improved assemblies of larger ones.

WENGAN: Efficient and high quality hybrid de novo assembly of human genomes

TLDR
The development of a novel algorithm for hybrid assembly, WENGAN, and the de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology are reported.

Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms.

TLDR
This approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing.

Benchmarking of long-read assemblers for prokaryote whole genome sequencing.

TLDR
Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall, but no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
...

References

SHOWING 1-10 OF 44 REFERENCES

Hybrid error correction and de novo assembly of single-molecule sequencing reads

TLDR
This work introduces a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences, leading to substantially better assemblies than current sequencing strategies.

A hybrid approach for the automated finishing of bacterial genomes

TLDR
This work combines sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy and provides the next generation of rapid microbial identification and full-genome assembly.

GAGE: A critical evaluation of genome assemblies and assembly algorithms.

TLDR
Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.

Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology

TLDR
The algorithm and associated software tool, PBJelly, automates the finishing process using long sequence reads in a reference-guided assembly process and was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads

TLDR
A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented that identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph.

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

TLDR
The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.

Assembly complexity of prokaryotic genomes using short reads

TLDR
The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths and demonstrates that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot.

Improving genome assemblies by sequencing PCR products with PacBio.

TLDR
A genome improvement pipeline is developed after decreasing a loading bias against larger PCR products in the PacBio process that is not only cost-effective but also can close gaps greater than 2.5 kb in a single round of reactions, and sequence through high GC regions as well as difficult secondary structures such as small hairpin loops.

Minimus: a fast, lightweight genome assembler

TLDR
The Minimus assembler is developed to address the challenges of large whole-genome sequencing projects and finds that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools.

Finished bacterial genomes from shotgun sequence data.

TLDR
By applying a new laboratory design and new assembly algorithm to 16 samples, it is demonstrated that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation.