Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data

@article{PezEspino2017NontargetedVS,
  title={Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data},
  author={David P{\'a}ez-Espino and Georgios A. Pavlopoulos and Natalia N. Ivanova and Nikos C. Kyrpides},
  journal={Nature Protocols},
  year={2017},
  volume={12},
  pages={1673-1682}
}
The analysis of large microbiome data sets holds great promise for the delineation of the biological and metabolic functioning of living organisms and their role in the environment. In the midst of this genomic puzzle, viruses, especially those that infect microbial communities, represent a major reservoir of genetic diversity with great impact on biogeochemical cycles and organismal health. Overcoming the limitations associated with virus detection directly from microbiomes can provide key… 
Identifying viruses from metagenomic data by deep learning.
TLDR
A reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques that will significantly accelerate the discovery rate of viruses.
Mini‐Metagenomics and Nucleotide Composition Aid the Identification and Host Association of Novel Bacteriophage Sequences
TLDR
A computational approach that uses supervised learning to classify metagenomic contigs as phage or non‐phage as well as assigning phage taxonomy based on tetranucleotide frequencies is described, demonstrating the value of combining viral sequence identification with mini‐metagenomic experimental methods to understand the microbial ecosystem.
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses
TLDR
The third version of IMG/VR is presented, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version, and annotated with a new standardized pipeline including genome quality estimation using CheckV and expanded host taxonomy prediction.
Identifying viruses from metagenomic data using deep learning
TLDR
Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.
Ecology and molecular targets of hypermutation in the global microbiome
TLDR
It is determined that Diversity-generating retroelements have a single evolutionary origin and a universal bias towards adenine mutations, and are consistently and broadly active, and responsible for >10% of all amino acid changes in some organisms at a conservative estimate.
Ecology and molecular targets of hypermutation in the global microbiome
TLDR
A dataset of >30,000 DGRs from public metagenomes is analyzed, six major lineages are established, and several distinct roles these elements play in natural communities are elucidated.
TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data
TLDR
A hybrid pipeline named TAR-VIR is developed that reconstructs viral strains without relying on complete or high-quality reference genomes and can be used standalone for viral strain reconstruction from metagenomic data.
Investigation of recombination-intense viral groups and their genes in the Earth’s virome
TLDR
This study systematically examined signatures of recombination in every gene from 211 species-level viral groups in a recently obtained dataset of the Earth’s virome that contain corresponding information on the host bacterial species to identify recombination-intense genes that are significantly enriched for encoding phage morphogenesis proteins.
Cenote-Taker 2 Democratizes Virus Discovery and Sequence Annotation
TLDR
Cenote-Taker2, a virus discovery and annotation tool available on command line and with a graphical user interface with free high-performance computation access, utilizes highly sensitive models of hallmark virus genes to discover familiar or divergent viral sequences from user-input contigs.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 47 REFERENCES
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
TLDR
IMG/VR is presented, the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples, serving as an essential resource in the viral genomics community.
VirSorter: mining viral signal from microbial genomic data
TLDR
VirSorter is a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses.
Computational approaches to predict bacteriophage–host relationships
TLDR
Analysis of 820 phages with annotated hosts shows how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications.
Uncovering Earth’s virome
TLDR
Analysis of viral distribution across diverse ecosystems revealed strong habitat-type specificity for the vast majority of viruses, but also identified some cosmopolitan groups, and detailed insight into viral habitat distribution and host–virus interactions is provided.
Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation
TLDR
The pVOGs database represents a comprehensive set of orthologous gene families shared across multiple complete genomes of viruses that infect bacterial or archaeal hosts (viruses of eukaryotes will be added at a future date).
Functional metagenomic profiling of nine biomes
TLDR
The magnitude of the microbial metabolic capabilities encoded by the viromes was extensive, suggesting that they serve as a repository for storing and sharing genes among their microbial hosts and influence global evolutionary and metabolic processes.
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses
TLDR
A global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts is presented to present a necessary foundation for the meaningful integration of viruses into ecosystem models where they act as key players in nutrient cycling and trophic networks.
Expanding the Marine Virosphere Using Metagenomics
TLDR
A direct approach to viral population genomics is allowed, confirming the remarkable mosaicism of phage genomes.
Community-wide analysis of microbial genome sequence signatures
TLDR
It is found that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities and genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.
A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes
TLDR
The discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which is referred to as crAssphage and predicted to have a Bacteroides host for this phage, consistent with Bactseroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.
...
1
2
3
4
5
...