OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

  title={OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups},
  author={F. Chen and Aaron J. Mackey and Christian J. Stoeckert and David S. Roos},
  journal={Nucleic Acids Research},
  pages={D363 - D368}
The OrthoMCL database () houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed… 

Figures and Tables from this paper

Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups.

This work describes how you can group your proteins of interest into ortholog clusters using two different means provided by the OrthoMCL system.

ProGMap: an integrated annotation resource for protein orthology

ProGMap is a web-tool designed to help researchers and database annotators to assess the coherence of protein groups defined in various databases and thereby facilitate the annotation of newly sequenced proteins.

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying and is presented, which allows the complete Ensembl gene database to be queried using phylogenetic patterns.

orthofind: a novel method for identifying functional orthologues

Motivation: There is a need for easy identification of functionally equivalent orthologues in different species that is not currently met directly by protein sequence databases (e.g.,

Immunity genes and their orthologs: a multi-species database.

All the immunity genes and their evidence of immune function, orthologs and ortholog groups have been combined into an open access database -- ImmunomeBase, which is publicly available from (http://bioinf.fi/ImmunomeBase).

BLASTO: a tool for searching orthologous groups

BLASTO incorporates the best-known multispecies ortholog databases, including NCBI Clusters of Orthologous Group, NCBI euKaryotic OrthOLOGous Group database, OrthoMCL, MultiParanoid and TIGR Eukaryotic Gene Orthologues database, and offers a useful platform to integrate orthology information into functional inference and evolutionary studies of individual sequences.

YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms

YOGY is a web-based resource for orthologous proteins from nine eukaryotic organisms that provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, HomoloGene, OrthoMCL and a table of curated fission and budding yeast orthology.

eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges

The third version of the eggNOG database contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNog v2 and the newly designed web page is considerably faster with more functionality.

OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis

The development of OrtholugeDB facilitates rapid, and more accurate, bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions and is compared with similar methods, showing how it may more consistently identify orthologs with conserved features across a wide range of taxonomic distances.

The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

The Princeton Protein Orthology Database is described, a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases.



OrthoMCL: identification of ortholog groups for eukaryotic genomes.

OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.

The COG database: an updated version includes eukaryotes

A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.

A genomic perspective on protein families.

Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis.

An efficient algorithm for large-scale detection of protein families.

This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.

Genome phylogeny based on gene content

This comprehensive genome phylogeny is independent of phylogenies based on the level of sequence identity of individual genes, and correlates with the standard reference of prokarytic phylogeny based on sequence similarity of 16s rRNA (ref. 4).

SHOT: a web server for the construction of genome phylogenies.

Pfam: the protein families database

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in

Ensembl 2004

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of

The tree of eukaryotes.