Benchmarking ortholog identification methods using functional genomics data

@article{Hulsen2005BenchmarkingOI,
  title={Benchmarking ortholog identification methods using functional genomics data},
  author={Tim Hulsen and Martijn A. Huynen and Jacob de Vlieg and Peter M. A. Groenen},
  journal={Genome Biology},
  year={2005},
  volume={7},
  pages={R31 - R31}
}
BackgroundThe transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in… Expand

Tables and Topics from this paper

Orthology inference: Methods, benchmarking and applications
TLDR
An algorithm to infer hierarchical orthologyous groups based solely on pairwise orthologous gene relations is introduced, the first algorithm of its kind which is based on graph-theoretic properties derived from perfect orthology graphs. Expand
Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However,Expand
Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals
TLDR
It is concluded that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act, and shed light on the relationship between sequence divergence and functional divergence. Expand
An integrative approach to ortholog prediction for disease-focused and other functional studies
TLDR
DIOPT is used to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets and to facilitate using model organisms for functional analysis of human disease-associated genes. Expand
orthofind: a novel method for identifying functional orthologues
Motivation: There is a need for easy identification of functionally equivalent orthologues in different species that is not currently met directly by protein sequence databases (e.g.,Expand
Orthology confers intron position conservation
TLDR
It is concluded that orthologous genes tend to have more conserved intron positions compared to non-orthologistous genes, which implies a connection between shifts in intronic structure and the origin of multicellularity. Expand
Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships
TLDR
Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable. Expand
Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade
TLDR
The results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs. Expand
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants
TLDR
A procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana and an optimized phylogenomics pipeline for Ortholog inference is developed that outperforms similarly based methods in predicting ortholog and paralog relationships. Expand
ProtPhylo: identification of protein–phenotype and protein–protein functional associations via phylogenetic profiling
TLDR
ProtPhylo infers functional associations by comparing protein phylogenetic profiles for more than 9.7 million non-redundant protein sequences from all three domains of life by ranking phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 132 REFERENCES
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.
TLDR
This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods. Expand
Using orthologous and paralogous proteins to identify specificity determining residues
TLDR
While sets of orthologous and paralogous proteins can be easily derived from complete genomic sequences, the method can identify putative specificity determinants in such proteins. Expand
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
TLDR
OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. Expand
Phylogenomic inference of protein molecular function: advances and challenges
TLDR
An overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks and suggest procedures to increase accuracy are presented. Expand
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
TLDR
The OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available. Expand
Comparative genomics for reliable protein-function prediction from genomic data.
TLDR
These methods are discussed, illustrated by comparing yeast two-hybrid data from Saccharomyces cerevisiae with Y2H data from Drosophila melanogaster, and illustrating vertical comparative genomics by comparing RNA expression data with proteomic data from Plasmodium falciparum. Expand
RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs
TLDR
The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies and is described how some orthologies can be misleading for functional inference. Expand
An efficient algorithm for large-scale detection of protein families.
TLDR
This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases. Expand
The COG database: a tool for genome-scale analysis of protein functions and evolution
TLDR
The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes. Expand
The COG database: an updated version includes eukaryotes
TLDR
A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies. Expand
...
1
2
3
4
5
...