Sequence assembly demystified

@article{Nagarajan2013SequenceAD,
  title={Sequence assembly demystified},
  author={Niranjan Nagarajan and Mihai Pop},
  journal={Nature Reviews Genetics},
  year={2013},
  volume={14},
  pages={157-167}
}
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be… 

Recent advances in sequence assembly: principles and applications

How individual processes, such as optimal k‐mer determination and error correction in assembly, rely on intelligent strategies or high‐performance computation is focused on.

Recent Advances in Gene and Genome Assembly: Challenges and Implications

A gist of the tools used by the genome assemblers is provided and the challenges and implications of the next generation sequencing chemistries shaped up by the development of tools are discussed.

The impact of third generation genomic technologies on plant genome assembly.

Genomes correction and assembling: present methods and tools

This paper addresses the issue of assembly pipeline for de novo genome assembly provided by programs presently available for scientist both as commercial and as open – source software.

Whole-genome sequencing in bacteriology: state of the art

  • M. Dark
  • Biology
    Infection and drug resistance
  • 2013
This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

Sequencing depth and coverage: key considerations in genomic analyses

The issue of sequencing depth in the design of next-generation sequencing experiments is discussed and current guidelines and precedents on the issue of coverage are reviewed for four major study designs, including de novo genome sequencing, genome resequencing, transcriptome sequencing and genomic location analyses.

Building and Improving Reference Genome Assemblies

The relationship between sequencing technology improvements and assembly algorithm development and how these are applied to extend and improve human and nonhuman genome assemblies are discussed.

Empirical evaluation of methods for de novo genome assembly

A thorough comparison of the de novo assembly algorithms is made to allow new users to clearly understand the assembly algorithms: overlap-layout-consensus and de-Bruijn-graph, string-graph based assembly, and hybrid approach.

Analyzing the E ects of Sequencer Discrepancies on Next-Generation Genome Assembly Tools

An in-depth architectural analysis of several popular de novo genome assemblers including expected behavioral changes across sequencer variations, and evaluations of these tools using data sets permuted over a range of coverage depths, read lengths, and read types are analyzed.

Evaluation of genome assembly software based on long reads

This report compares and evaluates several genome assembly software based on TSG technology, and performs experimentation on 4 reference genomes and the results evaluated with the QUAST software.
...

References

SHOWING 1-10 OF 103 REFERENCES

Limitations of next-generation genome sequence assembly

It is concluded that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.

Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing

This work suggests at least two ways in which existing assemblers can be extended in a rigorous fashion, in addition to delineating directions for future theoretical investigations.

GAGE: A critical evaluation of genome assemblies and assembly algorithms.

Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.

BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data

A software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency is presented.

Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps

This work presents a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs.

Scaffolding and validation of bacterial genome assemblies using optical restriction maps

The resulting assemblies contain a single scaffold covering a large fraction of the respective genomes, suggesting that the careful use of optical maps can provide a cost-effective framework for the assembly of genomes.

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

This extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of thetarget sequence in terms of size and correctness.

De novo assembly of human genomes with massively parallel short read sequencing.

The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

This study indicates that string-based assemblers, overlap-layout-consensus (OLC) assemblers are well-suited for very short reads and longer reads of small genomes respectively, and graph-basedassemblers would be more appropriate for large datasets of more than hundred millions of short reads.
...