The Closest BLAST Hit Is Often Not the Nearest Neighbor

@article{Koski2001TheCB,
  title={The Closest BLAST Hit Is Often Not the Nearest Neighbor},
  author={Liisa B. Koski and Geoffrey Brian Golding},
  journal={Journal of Molecular Evolution},
  year={2001},
  volume={52},
  pages={540-542}
}
It is well known that basing phylogenetic reconstructions on uncorrected genetic distances can lead to errors in their reconstruction. Nevertheless, it is often common practice to report simply the most similar BLAST (Altschul et al. 1997) hit in genomic reports that discuss many genes (Ruepp et al. 2000; Freiberg et al. 1997). This is because BLAST hits can provide a rapid, efficient, and concise analysis of many genes at once. These hits are often interpreted to imply that the gene is most… 
Phylogenetic analysis of BLAST results
TLDR
Viewing essentially one-dimensional BLAST analysis from the perspective of a two-dimensional phylogenetic analysis has a number of benefits including more accurate identification of the true “top hit”, delineation of gene families, identification of true homologs, and improved functional assignment of orthologs and paralogs.
Ortholog detection using the reciprocal smallest distance algorithm.
TLDR
The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD), which improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect Orthologs between two genomes.
FastBLAST: Homology Relationships for Millions of Proteins
TLDR
FastBLAST is a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer, and enables research groups that do not have supercomputers to analyze large protein sequence data sets.
Improving the specificity of high-throughput ortholog prediction
TLDR
The Ortholuge method appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species and will aid those performing various comparative genomics-based analyses.
23 : 2 Outlier Detection in BLAST Hits 1 Introduction
TLDR
A method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms is developed.
RoundUp : a repository of orthologs and corresponding evolutionary distances
TLDR
This work has pre-computed orthologs for 213 genomes using the reciprocal smallest distance algorithm, the most comprehensive of its kind.
Computational methods for Gene Orthology inference
TLDR
Comparisons of tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances.
Outlier detection in BLAST hits
TLDR
A method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms is developed.
A Tight Link between Orthologs and Bidirectional Best Hits in Bacterial and Archaeal Genomes
TLDR
It is concluded that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
Phylometrics: a pipeline for inferring phylogenetic trees from a sequence relationship network perspective
TLDR
Phylometrics provides a novel data mining method to screen supplied DNA sequences and to identify sequences that are of significant phylogenetic interest using powerful analytical tools.
...
...

References

SHOWING 1-10 OF 14 REFERENCES
Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis.
TLDR
It is suggested that functional predictions can be greatly improved by focusing on how the genes became similar in sequence (i.e., evolution) rather than on the sequence similarity itself.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
TLDR
A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Estimates of DNA and protein sequence divergence: an examination of some assumptions.
TLDR
Some of the assumptions underlying estimates of DNA and protein sequence divergence are examined and it is shown that these conditions do not strongly affect estimates of divergence, and the binomial variance that is usually assumed for these estimates is safely conservative.
A phylogenomic approach to microbial evolution.
TLDR
The complete set of phylogenetic trees derived from the proteome of an organism as the phylome is defined and the term phylogenetic connection as a concept that describes the relative relationships between taxa in a tree is introduced.
Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes.
TLDR
It is shown that universal phylogenies of ribosomal RNAs and RNA polymerases built by ignoring variation are biased toward the archaebacterial tree because of attraction between long branches, while taking among-site rate variability into account gives support for the eocyte tree.
A phylogenomic study of DNA repair genes, proteins, and processes.
Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies
TLDR
A versatile method, quartet puzzling, is introduced to reconstruct the topology (branching pattern) of a phylogenetic tree based on DNA or amino acid sequence data and outperforms neighbor joining in some cases with high transition/transversion bias.
The mosaic nature of the eukaryotic nucleus.
TLDR
The phylogenies for each of the protein-coding genes from the Methanococcus jannaschii genome were surveyed to determine the history of the major groups of life, and results indicate that support for the early history of life is not unequivocal.
Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima
TLDR
Genome analysis reveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea.
Molecular archaeology of the Escherichia coli genome.
  • J. Lawrence, H. Ochman
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1998
TLDR
It is found that 755 of 4,288 ORFs have been introduced into the E. coli genome in at least 234 lateral transfer events since this species diverged from the Salmonella lineage 100 million years ago.
...
...