Eugene V. Koonin

Learn More
The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative(More)
In order to extract the maximum amount of information from the rapidly accumulating genome sequences, all conserved genes need to be classified according to their homologous relationships. Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the(More)
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and(More)
PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been(More)
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a(More)
The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of evolution for both the cas genes and the unique spacer(More)
Despite the rapid mutational change that is typical of positive-strand RNA viruses, enzymes mediating the replication and expression of virus genomes contain arrays of conserved sequence motifs. Proteins with such motifs include RNA-dependent RNA polymerase, putative RNA helicase, chymotrypsin-like and papain-like proteases, and methyltransferases. The(More)
Analysis of the 1,042,519-base pair Chlamydia trachomatis genome revealed unexpected features related to the complex biology of chlamydiae. Although chlamydiae lack many biosynthetic capabilities, they retain functions for performing key steps and interconversions of metabolites obtained from their mammalian host cells. Numerous potential(More)
Lactic acid-producing bacteria are associated with various plant and animal niches and play a key role in the production of fermented foods and beverages. We report nine genome sequences representing the phylogenetic and functional diversity of these bacteria. The small genomes of lactic acid bacteria encode a broad repertoire of transporters for efficient(More)
Using a combination of computer methods for iterative database searches and multiple sequence alignment, we show that protein sequences related to the AAA family of ATPases are far more prevalent than reported previously. Among these are regulatory components of Lon and Clp proteases, proteins involved in DNA replication, recombination, and restriction(More)