DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly

@article{Guo2015DIMEAN,
  title={DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly},
  author={Xuan Guo and Ning Yu and Xiaojun Ding and Jianxin Wang and Yi Pan},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  year={2015},
  volume={22 2},
  pages={
          159-77
        }
}
  • Xuan Guo, Ning Yu, Yi Pan
  • Published 1 February 2015
  • Biology
  • Journal of computational biology : a journal of computational molecular cell biology
The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping… 

Figures and Tables from this paper

Deconvolute individual genomes from metagenome sequences through read clustering
TLDR
This paper extends a previously developed scalable read clustering method on Apache Spark, SpaRC, by adding a new method to further cluster small clusters that exploits statistics derived from multiple samples in a dataset to reduce the under-clustering problem.
SpaRC: Scalable Sequence Clustering using Apache Spark
TLDR
A Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization and suggests SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments.
Sequence analysis SpaRC : scalable sequence clustering using Apache Spark
TLDR
An Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions reads based on their molecule of origin to enable downstream assembly optimization and produces high clustering performance on transcriptomes and metagenomes from both short and long read sequencing technologies.
Deconvolute individual genomes from metagenome sequences through short read clustering
TLDR
This work extended their previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem and demonstrate that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage.
Current challenges and solutions of de novo assembly
TLDR
This review gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.
Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
TLDR
The results show that if results from only one assembler are considered, biologically important reads can easily be overlooked, and the impacts of these results on the field of pathogen discovery are considered.
Lawrence Berkeley National Laboratory Recent Work Title SpaRC : scalable sequence clustering using Apache Spark
TLDR
An Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions reads based on their molecule of origin to enable downstream assembly optimization and produces high clustering performance on transcriptomes and metagenomes from both short and long read sequencing technologies.
Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive
TLDR
The results suggest integrating heterogeneous technologies such as Hadoop and MPI is quite efficient to solve large genomics problems that are both data-intensive and compute-intensive.
ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution
TLDR
An iterative seed-extension algorithm for de novo assembly called ISEA, which uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem and can effectively obtain longer and more accurate scaffolds.
Lawrence Berkeley National Laboratory Recent Work Title Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data-and Compute-intensive Permalink
TLDR
Results from this case study suggest the combined Hadoop with MPI approach has great potential in large genomics applications that are both data-intensive and compute-intensive.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Cloud Computing for De Novo Metagenomic Sequence Assembly
TLDR
A parallel strategy to accelerate computation and boost accuracy in the assembly of sequenced reads from an environmental sample and the ability of reconstruction of bases outperformed other tools both on speed and several assembly evaluation metrics.
Genovo: De Novo Assembly for Metagenomes
TLDR
Genovo is presented, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model and its reconstructions cover more bases and recover more genes than the other methods, even for low-abundance sequences, and yield a higher assembly score.
Meta-IDBA: a de Novo assembler for metagenomic data
TLDR
Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta- IDBA can reconstruct longer contigs with similar accuracy.
De novo assembly methods for next generation sequencing data
TLDR
This paper compares the seed extension and graph-based methods that use the overlap/lapout/consensus approach and the de Bruijn graph approach for assembly and discusses the future directions of genome assembly.
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
TLDR
This work provides an updated assembly version of the 2008 Asian genome using SOAPdenovo2, a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
TLDR
MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds.
The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes
TLDR
The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes that is stable, extensible, and freely available to all researchers.
Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data
TLDR
A critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data and shows that the assembly process reduces the accuracy of the functional classification of the metagenomics data and that these errors can be overcome raising the coverage of the studied metagenome.
Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products
TLDR
Applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells and further sequences all genomes from the mini-metagenome at once.
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
TLDR
Evaluating several of the leading de novo assembly algorithms on four different short-read data sets generated by Illumina sequencers concludes that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome.
...
...