The Theory and Practice of Genome Sequence Assembly.

@article{Simpson2015TheTA,
  title={The Theory and Practice of Genome Sequence Assembly.},
  author={Jared T. Simpson and Mihai Pop},
  journal={Annual review of genomics and human genetics},
  year={2015},
  volume={16},
  pages={
          153-72
        }
}
  • J. Simpson, M. Pop
  • Published 31 August 2015
  • Biology
  • Annual review of genomics and human genetics
The current genomic revolution was made possible by joint advances in genome sequencing technologies and computational approaches for analyzing sequence data. The close interaction between biologists and computational scientists is perhaps most apparent in the development of approaches for sequencing entire genomes, a feat that would not be possible without sophisticated computational tools called genome assemblers (short for genome sequence assemblers). Here, we survey the key developments in… 

Figures from this paper

New Approaches for Genome Assembly and Scaffolding.
TLDR
An overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem is given and new technologies for chromosome- scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost are reviewed.
Modern technologies and algorithms for scaffolding assembled genomes
TLDR
This work surveys technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.
Scalable Parallel Algorithms for Genome Analysis
TLDR
A novel algorithm that leverages one-sided communication capabilities of the UPC to facilitate the requisite fine-grained, irregular parallelism and the avoidance of data hazards is presented, enabling the first massively scalable, high quality, complete end-to-end de novo assembly pipeline.
Whole-Genome Sequencing Recommendations
TLDR
A first primer on whole-genome sequencing is provided, focusing on two of the most popular applications: de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety.
In Silico Whole Genome Sequencer and Analyzer (iWGS): a Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies
TLDR
iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols, and supports all major sequencing technologies and popular assembly tools.
in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies
TLDR
iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols, and supports all major sequencing technologies and popular assembly tools.
Chapter 2 Whole-Genome Sequencing Recommendations
TLDR
The recent revolution in sequencing technologies has democratized genome sequencing projects and brought about the necessity of keeping up with recent developments and strategies, as the sequencing technologies and bioinformatic tools for downstream analyses keep evolving at a fast pace.
A practical guide to de novo genome assembly using long reads
TLDR
This work analyzes recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies, and motivates a set of preliminary best practices for assembly, a 'missing manual' that guides key decisions in building high quality de novo genome assemblies.
The Most Frequently Used Sequencing Technologies and Assembly Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome Databases
  • B. Segerman
  • Biology, Engineering
    Frontiers in Cellular and Infection Microbiology
  • 2020
TLDR
This mini review provides an overview of the currently most common workflows for producing bacterial whole genome sequence assemblies and the most frequently used assembly software solutions.
...
...

References

SHOWING 1-10 OF 124 REFERENCES
Genome Sequence Assembly: Algorithms and Issues
TLDR
How algorithms that can assemble millions of DNA fragments into gene sequences underlie the current revolution in biotechnology are considered, helping researchers build the growing database of complete genomes.
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
TLDR
Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.
Initial sequencing and analysis of the human genome.
TLDR
The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
A New Algorithm for DNA Sequence Assembly
TLDR
This paper proposes a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods, and promises to be very fast and practical forDNA sequence assembly.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base
Short read fragment assembly of bacterial genomes.
TLDR
A new Eulerian assembler is presented that generates nearly optimal short read assemblies of bacterial genomes and an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.
The genome sequence of Drosophila melanogaster.
TLDR
The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing
TLDR
This work suggests at least two ways in which existing assemblers can be extended in a rigorous fashion, in addition to delineating directions for future theoretical investigations.
The whole genome assembly of Drosophila
TLDR
The quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it are reported on and should be of substantial value to the scientific community.
High-quality draft assemblies of mammalian genomes from massively parallel sequence data
TLDR
The development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform, have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome.
...
...