Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets

@article{Mohammed2011EuDetectAA,
  title={Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets},
  author={Monzoorul Haque Mohammed and Sudha Chadaram and Dinakar Komanduri and Tarini Shankar Ghosh and Sharmila S. Mande},
  journal={Journal of Biosciences},
  year={2011},
  volume={36},
  pages={709-717}
}
Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host… 
CS-SCORE: Rapid identification and removal of human genome contaminants from metagenomic datasets.
TLDR
This study presents CS-SCORE--a novel algorithm that can rapidly identify host sequences contaminating metagenomic datasets and achieves this efficiency by incorporating a heuristic pre-filtering mechanism and a directed-mapping approach that utilizes a novel sequence composition metric (cs-score).
PhyloSift: phylogenetic analysis of genomes and metagenomes
TLDR
This work presents an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample and applies new tools to analyze the phylogenetic diversity of microbial communities.
Classification of metagenomic sequences: methods and challenges
TLDR
The premise, methodologies, advantages, limitations and challenges of various methods available for binning of metagenomic datasets obtained using the shotgun sequencing approach are discussed.
Expression of eukaryotic‐like protein in the microbiome of sponges
TLDR
This study shows that ELP genes in sponge symbionts represent actively expressed functions that could mediate molecular interaction between symbiosis partners.
Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques
TLDR
The efficiency/utility of this rule mining approach in deciphering multiple (biologically meaningful) association patterns between 'subsets/subgroups' of microbes (constituting microbiome samples) is demonstrated.
Prevention, diagnosis and treatment of high‐throughput sequencing data pathologies
TLDR
It is argued that careful QC of HTS data is an important – yet often neglected – aspect of their application in molecular ecology, and lay the groundwork for developing a H TS data QC ‘best practices’ guide.

References

SHOWING 1-10 OF 24 REFERENCES
Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets
TLDR
DeconSeq is a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length) and allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods.
Application of tetranucleotide frequencies for the assignment of genomic fragments.
TLDR
The results of this systematic study show that the discriminatory power of correlations of tetranucleotide-derived z-scores is by far superior to that of differences in (G + C)-content and provides reasonable assignment probabilities when applied to metagenome libraries of small diversity.
Picoeukaryotic sequences in the Sargasso Sea metagenome
TLDR
Despite similar cell size, eukaryotic sequences of the Sargasso Sea metagenome have higher GC content, suggesting that different environmental pressures affect the evolution of their base composition.
Evolutionary implications of microbial genome tetranucleotide frequency biases.
TLDR
Grouping prokaryotes based on TUD profiles resulted in relationships with important differences from those based on 16S rRNA phylogenies, which may reflect unequal rates of evolution of nucleotide usage patterns following divergence of particular organisms from a common ancestor.
The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families
TLDR
This work used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling sequences to add a great deal of diversity to known protein families and shed light on their evolution.
Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity
TLDR
35 full sequences of the small-subunit (18S) ribosomal RNA gene derived from a picoplanktonic assemblage collected at a depth of 75 m in the equatorial Pacific Ocean are analysed and show that there is a high diversity of picoeukaryotes.
MetaSim—A Sequencing Simulator for Genomics and Metagenomics
TLDR
A sequencing simulator called MetaSim is introduced that allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.
Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton
TLDR
Using ribosomal RNA genes from marine picoplankton, several new groups of bacteria and archaea have been identified, some of which are abundant and related to dinoflagellates that are found at all studied depths and suggest a radiation early in the evolution of alveolates.
Micro-eukaryotic diversity of the human distal gut microbiota: qualitative assessment using culture-dependent and -independent analysis of faeces
TLDR
Eukaryotic diversity of the human gut is low, largely temporally stable and predominated by different subtypes of Blastocystis, and specific analyses of the fungal populations indicate that a disparity exists between the cultivable fraction, which is dominated by Candida sp, and culture-independent analysis.
Environmental Genome Shotgun Sequencing of the Sargasso Sea
TLDR
Over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors are identified, suggesting substantial oceanic microbial diversity.
...
1
2
3
...