Domain combinations in archaeal, eubacterial and eukaryotic proteomes.

@article{Apic2001DomainCI,
  title={Domain combinations in archaeal, eubacterial and eukaryotic proteomes.},
  author={Gordana Apic and Julian Gough and Sarah A. Teichmann},
  journal={Journal of molecular biology},
  year={2001},
  volume={310 2},
  pages={
          311-25
        }
}
There is a limited repertoire of domain families that are duplicated and combined in different ways to form the set of proteins in a genome. Proteins are gene products, and at the level of genes, duplication, recombination, fusion and fission are the processes that produce new genes. We attempt to gain an overview of these processes by studying the evolutionary units in proteins, domains, in the protein sequences of 40 genomes. The domain and superfamily definitions in the Structural… Expand
Genomic and structural aspects of protein evolution.
TLDR
This review discusses the number of currently known superfamilies, their size and distribution, and superfamily expansions related to biological complexity and to specific lineages, and the extraordinary variety of the domain combinations found in different genomes. Expand
Global phylogeny determined by the combination of protein domains in proteomes.
TLDR
This study surveys the combination of protein domains defined at fold and fold superfamily levels in 185 genomes belonging to organisms that have been fully sequenced and introduces a method that reconstructs rooted phylogenomic trees from the content and arrangement of domains in proteins at a genomic level. Expand
Protein families and their evolution-a structural perspective.
TLDR
It is shown that about two thirds of the sequences from completed genomes can be assigned to as few as 1400 domain families for which structures are known and thus more ancient evolutionary relationships established. Expand
Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination
TLDR
It is established here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. Expand
Analysis of Domain Combinations in Eukaryotic Genomes
TLDR
Here, whole protein sets from completely sequenced and semi-completely sequenced genomes including draft eukaryotic genomes are collected, and the domain combinations are analyzed to obtain an overview of eukARYotic genomes. Expand
This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes
TLDR
This work assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far, and shows that independent evolution of domain combinations is significantly more prevalent than previously thought. Expand
Domain rearrangements in protein evolution.
TLDR
A novel measure, domain distance, is defined, which is calculated as the number of domains that differ between two domain architectures, and it is found that indels are more common than internal repetition and that the exchange of a domain is rare. Expand
Evolution of the PWWP-domain encoding genes in the plant and animal lineages
TLDR
It is found that as a single module the PWWP domain occurs only in proteins with a limited, mainly, species-specific distribution, and models wherein more complex protein architectures involving the P WWP domain occur with the appearance of more evolutionarily advanced life forms do not support these results. Expand
Protein Family Expansions and Biological Complexity
TLDR
The identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms is determined and one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms is provided. Expand
Comprehensive analysis of co-occurring domain sets in yeast proteins
TLDR
This work designs a novel representation of proteins and their constituent domains as a protein-domain network, and provides a comprehensive list of co-occurring domain sets in yeast, and sheds light on their function and evolution. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements.
TLDR
It is shown that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication, more than twice that found by using pairwise sequence comparisons. Expand
Immunoglobulin superfamily proteins in Caenorhabditis elegans.
TLDR
This study describes the repertoire of proteins that are members of the immunoglobulin superfamily (IgSF) in Caenorhabditis elegans, a framework for refinement and extension of the repertoire as gene and protein definitions improve, and the basis for investigations of their function and for comparisons with the repertoires of other organisms. Expand
Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms
  • E. Wallin, G. Heijne
  • Biology, Medicine
  • Protein science : a publication of the Protein Society
  • 1998
TLDR
Detailed statistical analyses of integral membrane proteins of the helix‐bundle class from eubacterial, archaean, and eukaryotic organisms for which genome‐wide sequence data are available suggest that uni‐cellular organisms appear to prefer proteins with 6 and 12 transmembrane segments, whereas Caenorhabditis elegans and Homo sapiens have a slight preference for proteins with seven transmemBRane segments. Expand
Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census
TLDR
Eight microbial genomes are compared in terms of protein structure and patterns of fold usage—whether a given fold occurs in a particular organism and all the genomes appear to have similar usage patterns for these folds, according to a “Zipf‐like” law. Expand
Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module.
TLDR
It is confirmed that E. coli contains a very high proportion of paralogous proteins and found that the segments of homology fell into 352 sequence-related groups or families, which strongly suggests that the 1404 present-day modules and proteins derive from a minimal set of 352 ancestral modules. Expand
Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster.
TLDR
The identification and analysis of the cadherin repertoires in the genomes of Caenorhabditis elegans and Drosophila melanogaster are presented and it is shown that three pairs of genes, and two triplets, should be merged to form five single genes. Expand
Estimating the number of protein folds and families from complete genome data.
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number ofExpand
CATH--a hierarchic classification of protein domain structures.
TLDR
Analysis of the structural families generated by CATH reveals the prominent features of protein structure space and a database of well-characterised protein structure families will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures. Expand
How representative are the known structures of the proteins in a complete genome? A comprehensive structural census.
TLDR
The proteins encoded by the genomes are significantly different from those in the structure databank, and their sequence lengths, which follow an extreme value distribution, are longer than the PDB proteins and much shorter than the biophysical proteins. Expand
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.
TLDR
The extent to which the SAM-T98 implementation of a hidden Markov model procedure; PSI-BLAST; and the intermediate sequence search (ISS) procedure can detect evolutionary relationships between the members of the sequence database PDBD40-J is determined. Expand
...
1
2
3
4
...