Benchmarking ortholog identification methods using functional genomics data

  title={Benchmarking ortholog identification methods using functional genomics data},
  author={Tim Hulsen and Martijn A. Huynen and Jacob de Vlieg and Peter M. A. Groenen},
  journal={Genome Biology},
  pages={R31 - R31}
BackgroundThe transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in… 

Orthology inference: Methods, benchmarking and applications

An algorithm to infer hierarchical orthologyous groups based solely on pairwise orthologous gene relations is introduced, the first algorithm of its kind which is based on graph-theoretic properties derived from perfect orthology graphs.

Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However,

Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals

It is concluded that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act, and shed light on the relationship between sequence divergence and functional divergence.

An integrative approach to ortholog prediction for disease-focused and other functional studies

DIOPT is used to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets and to facilitate using model organisms for functional analysis of human disease-associated genes.

orthofind: a novel method for identifying functional orthologues

Motivation: There is a need for easy identification of functionally equivalent orthologues in different species that is not currently met directly by protein sequence databases (e.g.,

Orthology confers intron position conservation

It is concluded that orthologous genes tend to have more conserved intron positions compared to non-orthologistous genes, which implies a connection between shifts in intronic structure and the origin of multicellularity.

Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable.

Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

A procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana and an optimized phylogenomics pipeline for Ortholog inference is developed that outperforms similarly based methods in predicting ortholog and paralog relationships.

ProtPhylo: identification of protein–phenotype and protein–protein functional associations via phylogenetic profiling

ProtPhylo infers functional associations by comparing protein phylogenetic profiles for more than 9.7 million non-redundant protein sequences from all three domains of life by ranking phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance.


A methodology that predict orthologs between two species by sequence similarity searches based on mRNA sequences is presented, and the features of a web-accessible database on paralog and singleton genes of the model plant Arabidopsis thaliana, developed in the lab, are described.



Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.

Using orthologous and paralogous proteins to identify specificity determining residues

While sets of orthologous and paralogous proteins can be easily derived from complete genomic sequences, the method can identify putative specificity determinants in such proteins.

OrthoMCL: identification of ortholog groups for eukaryotic genomes.

OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.

Phylogenomic inference of protein molecular function: advances and challenges

An overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks and suggest procedures to increase accuracy are presented.

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

The OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available.

RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs

The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies and is described how some orthologies can be misleading for functional inference.

An efficient algorithm for large-scale detection of protein families.

This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.

The COG database: an updated version includes eukaryotes

A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.

Measuring genome evolution.

  • M. HuynenP. Bork
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1998
A hierarchy of rates at which genomes have changed during evolution is established and it is shown that some genomes are more highly organized than others: they show a higher degree of the clustering of genes that have orthologs in other genomes.