Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets

  title={Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets},
  author={Monzoorul Haque Mohammed and Sudha Chadaram and Dinakar Komanduri and Tarini Shankar Ghosh and Sharmila S. Mande},
  journal={Journal of Biosciences},
Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host… 

Classification of metagenomic sequences: methods and challenges

The premise, methodologies, advantages, limitations and challenges of various methods available for binning of metagenomic datasets obtained using the shotgun sequencing approach are discussed.

Expression of eukaryotic‐like protein in the microbiome of sponges

This study shows that ELP genes in sponge symbionts represent actively expressed functions that could mediate molecular interaction between symbiosis partners.

Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques

The efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated.

Prevention, diagnosis and treatment of high‐throughput sequencing data pathologies

It is argued that careful QC of HTS data is an important – yet often neglected – aspect of their application in molecular ecology, and lay the groundwork for developing a H TS data QC ‘best practices’ guide.

Clostridium difficile Colonization and Infection in the Elderly and Associations with the Aging Intestinal Microbiome

Exposure to antibiotics and acid-reducing medications were associated with an increased risk of rCDI among community-dwelling elders, however corticosteroid exposure reduced the risk of recurrence by 39%.

PhyloSift: phylogenetic analysis of genomes and metagenomes

This work presents an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample and applies new tools to analyze the phylogenetic diversity of microbial communities.



Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

DeconSeq is a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length) and allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods.

Application of tetranucleotide frequencies for the assignment of genomic fragments.

The results of this systematic study show that the discriminatory power of correlations of tetranucleotide-derived z-scores is by far superior to that of differences in (G + C)-content and provides reasonable assignment probabilities when applied to metagenome libraries of small diversity.

Evolutionary implications of microbial genome tetranucleotide frequency biases.

Grouping prokaryotes based on TUD profiles resulted in relationships with important differences from those based on 16S rRNA phylogenies, which may reflect unequal rates of evolution of nucleotide usage patterns following divergence of particular organisms from a common ancestor.

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

This work used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling sequences to add a great deal of diversity to known protein families and shed light on their evolution.

Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity

35 full sequences of the small-subunit (18S) ribosomal RNA gene derived from a picoplanktonic assemblage collected at a depth of 75 m in the equatorial Pacific Ocean are analysed and show that there is a high diversity of picoeukaryotes.

MetaSim—A Sequencing Simulator for Genomics and Metagenomics

A sequencing simulator called MetaSim is introduced that allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton

Using ribosomal RNA genes from marine picoplankton, several new groups of bacteria and archaea have been identified, some of which are abundant and related to dinoflagellates that are found at all studied depths and suggest a radiation early in the evolution of alveolates.

Micro-eukaryotic diversity of the human distal gut microbiota: qualitative assessment using culture-dependent and -independent analysis of faeces

Eukaryotic diversity of the human gut is low, largely temporally stable and predominated by different subtypes of Blastocystis, and specific analyses of the fungal populations indicate that a disparity exists between the cultivable fraction, which is dominated by Candida sp, and culture-independent analysis.

Environmental Genome Shotgun Sequencing of the Sargasso Sea

Over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors are identified, suggesting substantial oceanic microbial diversity.

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific

A metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition, which yielded an extensive dataset consisting of 7.7 million sequencing reads.