Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea

  title={Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea},
  author={Eugene V. Koonin and Arcady R. Mushegian and Michael Y. Galperin and D. Roland Walker},
  journal={Molecular Microbiology},
Protein sequences encoded in three complete bacterial genomes, those of Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp., and the first available archaeal genome sequence, that of Methanococcus jannaschii, were analysed using the blast2 algorithm and methods for amino acid motif detection. Between 75% and 90% of the predicted proteins encoded in each of the bacterial genomes and 73% of the M. jannaschii proteins showed significant sequence similarity to proteins from other… 
The COG database: a tool for genome-scale analysis of protein functions and evolution
The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes.
Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell.
Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely revealed 1326 orthologous sets, of which 543 are represented in all four species, and previously undetected orthologs in bacteria and eukaryotes were identified.
Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes
Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria and suggests that this evolution occurred in response to antibiotic selection pressures, and an alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed.
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions and suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaealing hyperthermophiles.
The Deep Archaeal Roots of Eukaryotes
A comprehensive set of 355 eukaryotic genes of apparent archaeal origin identified through ortholog detection and phylogenetic analysis is described and it is indicated that, for the majority of these genes, the preferred tree topology is one with the eUKaryotic branch placed outside the extant diversity of archaea.
What are archaebacteria: life's third domain or monoderm prokaryotes related to Gram‐positive bacteria? A new proposal for the classification of prokaryotic organisms
The hypothesis that archaebacteria and eukaryotes shared a common ancestor exclusive of eubacteria is not supported and evidence is provided for an alternate view of the evolutionary relationship among living organisms that is different from the currently popular three‐domain proposal.
Horizontal Transfer of Archaeal Genes into the Deinococcaceae: Detection by Molecular and Computer-Based Approaches
Compared to the total number of ORFs in the genome, those that can be identified as having been acquired from Archaea or Eukaryotes are relatively few (approximately 1%), suggesting that interdomain transfer is rare.
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs)
Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation.
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world
The prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of proKaryotic evolution.


Novel protein families in archaean genomes.
It is shown that the putative laminin receptor family of eukaryotes and an archaean homologue belong to the previously characterized ribosomal protein family S2 from eubacteria, suggesting that archaea seem to have a mode of expression of genetic information rather similar to eUKaryotes, while eub bacteria may have proceeded into unique ways of transcription and translation.
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
It is concluded that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products.
Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae.
The entire genome of the bacterium Mycoplasma pneumoniae M129 has been sequenced and a functional classification to a large number of ORFs is tentatively assigned and the biochemical and physiological properties of this bacterium are deduced.
Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli
Protein-based phylogenies support a chimeric origin for the eukaryotic genome.
The hypothesis of a chimeric origin for the eukaryotic cell nucleus formed from the fusion of an archaebacteria and a gram-negative bacteria is supported.
Sequencing and analysis of bacterial genomes
Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions.
  • T. Kaneko, S. Sato, S. Tabata
  • Biology
    DNA research : an international journal for rapid publication of reports on genes and genomes
  • 1996
The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously
Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search.
It is shown that bacterial haloacid dehalogenases (HADs) belong to a large superfamily of hydrolases with diverse substrate specificity and many of the proteins with known enzymatic activities in the HAD superfamily are involved in detoxification of xenobiotics or metabolic by-products.