COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations

@article{Jothi2006COCOCLHC,
  title={COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations},
  author={Raja Jothi and Elena Zotenko and Asba Tasneem and Teresa M. Przytycka},
  journal={Bioinformatics},
  year={2006},
  volume={22 7},
  pages={
          779-88
        }
}
MOTIVATION Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein-protein interactions. Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs. Traditional approaches have relied on pairwise… 

Figures from this paper

Inferring Hierarchical Orthologous Groups from Orthologous Gene Pairs
TLDR
GetHOGs (“Graph-based Efficient Technique for Hierarchical Orthologous Groups”), a novel algorithm to infer hierarchical groups directly from the orthology graph, thus without needing gene tree inference nor gene/species tree reconciliation, is devised.
Computational methods for Gene Orthology inference
TLDR
Comparisons of tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances.
Quantitative synteny scoring improves homology inference and partitioning of gene families
TLDR
GenFamClust is presented, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families and improves on Neighborhood Correlation method.
Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes
TLDR
It is demonstrated that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content, which makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods.
Orthology prediction at scalable resolution by phylogenetic tree analysis
TLDR
A benchmark for orthology prediction, that takes into account the varying levels of orthology between genes, shows that the phylogeny-based high-resolution orthology assignments made by LOFT are reliable.
Identification of conserved gene clusters in multiple genomes based on synteny and homology
TLDR
A local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window is presented.
Orthology inference: Methods, benchmarking and applications
TLDR
An algorithm to infer hierarchical orthologyous groups based solely on pairwise orthologous gene relations is introduced, the first algorithm of its kind which is based on graph-theoretic properties derived from perfect orthology graphs.
Graph-based methods for large-scale protein classification and orthology inference
TLDR
It is argued that establishing true orthologous relationships requires a phylogenetic approach which combines both trees and graphs (networks), reliable species phylogeny, genomic data for more than two species, and an insight into the processes of molecular evolution.
DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection
TLDR
DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes.
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species
TLDR
A web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters and allows for a customized search of clusters of specific genes through key words or BLAST is reported.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
TLDR
OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.
TLDR
This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.
Automated ortholog inference from phylogenetic trees and calculation of orthology reliability
TLDR
A novel method is presented that resolves the problem of finding orthologs by analyzing a set of bootstrap trees instead of the optimal tree and calculates orthology support levels for all pairwise combinations of homologous sequences of two species.
Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA).
TLDR
The TIGR Orthologous Gene Alignment database is developed to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs.
Towards detection of orthologues in sequence databases
TLDR
A method that attempts to construct a reconciled tree from a gene tree of selected sequences and its corresponding phylogenetic tree of the species involved (species tree) and an interface on the Web is developed to enable users to analyse the BLAST result.
Similarity of phylogenetic trees as indicator of protein-protein interaction.
TLDR
A new way of discovering possible protein-protein interactions based on the comparison of the evolutionary distances between the sequences of the associated protein families is proposed, an idea based on previous observations of correspondence between the phylogenetic trees of associated proteins in systems such as ligands and receptors.
Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs.
TLDR
This work quantitatively assesses the degree to which interologs can be reliably transferred between species as a function of the sequence similarity of the corresponding interacting proteins and introduces the concept of a "regulog"--a conserved regulatory relationship between proteins across different species.
RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs
TLDR
The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies and is described how some orthologies can be misleading for functional inference.
A genomic perspective on protein families.
TLDR
Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis.
The COG database: an updated version includes eukaryotes
TLDR
A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.
...
...