QuartetS: a fast and accurate algorithm for large-scale orthology detection

Abstract

The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.

DOI: 10.1093/nar/gkr308

Extracted Key Phrases

4 Figures and Tables

Showing 1-10 of 27 references

e88 Nucleic Acids Research

  • 2011
Showing 1-10 of 16 extracted citations
0204060802011201220132014201520162017
Citations per Year

95 Citations

Semantic Scholar estimates that this publication has received between 24 and 265 citations based on the available data.

See our FAQ for additional information.