Corpus ID: 14913979

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

  title={Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets},
  author={Adina Howe and Jason Pell and Rosangela Canino-Koning and Rachel Mackelprang and Susannah G. Tringe and Janet K. Jansson and James M. Tiedje and C. Titus Brown},
  journal={arXiv: Genomics},
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing… Expand
Assembling large, complex environmental metagenomes
Two pre-assembly filtering approaches, digital normalization and partitioning, are applied to make large metagenome assemblies more tractable, and it is demonstrated that these methods result in assemblies nearly identical to assemblies from unprocessed data. Expand
Meta-Pangenome: At the Crossroad of Pangenomics and Metagenomics
This chapter argues the notion of pangenome can be applied beyond the available genome sequences by leveraging metagenome-assembled genomes, to form a comprehensive representation of the genetic content of a taxonomic group in a particular environment. Expand
Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
This study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude, and proposes two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Expand
Dynamics of Microeukaryotes and Archaea in the Mammalian Gut Microbiome
The research performed in this dissertation will aid researchers looking to study all three domains of life and take into account the effects of commonly used antibiotics in future microbiome studies. Expand
These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure
The speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-MER counts for individual k-mers are analyzed. Expand
Assemblage adaptatif de génomes et de méta-génomes par passage de messages
A line of products that includes RayPlatform, Ray (which includes workflows called Ray Meta and Ray Communities for metagenomics) and Ray Cloud Browser are presented, its main application is scalable (adaptive) assembly and profiling of genomes using message passing. Expand
Novel computational approaches to investigate microbial diversity


Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
This research developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities, and examined the effect of rigorous quality control on Illumina data. Expand
Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data
A critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data and shows that the assembly process reduces the accuracy of the functional classification of the metagenomics data and that these errors can be overcome raising the coverage of the studied metagenome. Expand
Assembly algorithms for next-generation sequencing data.
This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo to compare the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly. Expand
MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. Expand
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
Three simulated data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition and explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. Expand
A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE
This work presents DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilizes it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms. Expand
Genome assembly reborn: recent computational challenges
  • M. Pop
  • Medicine, Computer Science
  • Briefings Bioinform.
  • 2009
The major algorithmic approaches for genome assembly are outlined and recent developments in this domain are described. Expand
De novo assembly of human genomes with massively parallel short read sequencing.
The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way. Expand
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
A memory-efficient graph representation based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly, is introduced, which reduces the overall memory requirements for de novo assembly of metagenomes. Expand
Systematic artifacts in metagenomes from complex microbial communities
A systematic error is found in metagenomes generated by 454-based pyrosequencing that leads to an overestimation of gene and taxon abundance; between 11% and 35% of sequences in a typical metagenome are artificial replicates. Expand