MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score

  title={MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score},
  author={Leszek P. Pryszcz and Jaime Huerta-Cepas and Toni Gabald{\'o}n},
  journal={Nucleic Acids Research},
  pages={e32 - e32}
Reliable prediction of orthology is central to comparative genomics. Approaches based on phylogenetic analyses closely resemble the original definition of orthology and paralogy and are known to be highly accurate. However, the large computational cost associated to these analyses is a limiting factor that often prevents its use at genomic scales. Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology and… 

Figures from this paper

MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life

The latest version of the MetaPhOrs web server is described which includes major new implementations and provides orthology and paralogy relationships derived from ∼8.2 million gene family trees—from 13 different source repositories across ∼4000 species with sequenced genomes.

Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes

It is found that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence, and the obtained orthologous groups differ vastly from one another.

Computational methods for Gene Orthology inference

Comparisons of tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances.

Comparing orthology methods and their performance by recapitulating patterns of eukaryotic genome evolution

It is found that most orthology methods reconstruct a large Last Eukaryotic Common Ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence, but there are large differences within the orthologies themselves, arising from how a method can differentiate between distant homology, recent duplications, or classifying orthologous groups.

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions

The third version of PhylomeDB, a public database for genome-wide collections of gene phylogenies (phylomes), is presented, which is the largest phylogenetic repository and hosts 17 phylomes, comprising 416 093 trees and 165 840 alignments.

PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome

A benchmark of the orthology predictions provided by the database is discussed, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes are discussed.

QuartetS: a fast and accurate algorithm for large-scale orthology detection

A novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner and should be preferred, respectively, in applications where high accuracy and high throughput are required.

Phylogenetic Method for High-Throughput Ortholog Detection

A phylogenetic tree based approach is used for identification of orthologous proteins and the use of Distance threshold allows controlling the stringency level of predictions so that the closeness and proximity between the protein of interest and its orthologs can be adjusted.

Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes

It is demonstrated that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines, and highlights a greater degree of evolutionary conservation across drosphilid species than earlier thought.

Broccoli: combining phylogenetic and network analyses for orthology assignment

Broccoli is a user-friendly pipeline designed to infer, with high precision, orthologous groups and pairs of proteins using a phylogeny-based approach and is scalable, with runtimes similar to those of recent distance-based pipelines.



Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However,

Automatic genome-wide reconstruction of phylogenetic gene trees

A novel and scalable algorithm that uses sequence similarity and a given species phylogeny to reconstruct the underlying evolutionary history of all genes in a large group of species, which opens the way to systematic studies of the evolution of individual genes, molecular systems and whole genomes.

The quest for orthologs: finding the corresponding gene across genomes.

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

The second version of the eggNOG database is presented, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering, and provides a broad functional description for at least 1 966 709 of them.

The human phylome

The human phylome is reconstructed, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes, and orthology and paralogy relationships of human proteins among eukARYotic genomes are derived.

PhylomeDB: a database for genome-wide collections of gene phylogenies

PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range and provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein.

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

The OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available.

Berkeley PHOG: PhyloFacts orthology group prediction web server

Results on a benchmark dataset from the TreeFam-A manually curated orthology database show that PHOG provides a combination of high recall and precision competitive with both InParanoid and OrthoMCL, and allows users to target different taxonomic distances and precision levels through the use of tree-distance thresholds.

The COG database: an updated version includes eukaryotes

A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.

Large-scale assignment of orthology: back to phylogenetics?

Reliable orthology prediction is central to comparative genomics. Although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons.