One thousand families for the molecular biologist

@article{Chothia1992OneTF,
  title={One thousand families for the molecular biologist},
  author={Cyrus Chothia},
  journal={Nature},
  year={1992},
  volume={357},
  pages={543-544}
}
BAYESIAN LOCAL FALSE DISCOVERY RATE FOR SPARSE COUNT DATA WITH APPLICATION TO THE DISCOVERY OF HOTSPOTS IN PROTEIN DOMAINS
Biostatistics Group, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia ∗, Department ofExpand
Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds
TLDR
It is discussed how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Expand
Structural biology and genome evolution: An introduction.
TLDR
It has taken a few decades of large-scale genome sequencing as well as many tens of thousands of 3D structural determinations of proteins to realize the full reach of Chothia's structural insights. Expand
Constructive Induction and Protein Tertiary Structure Prediction
TLDR
The learning method combines knowledge and search to shift the representation of sequences so that semantic similarity is more easily recognized by syntactic matching, and presents a novel constructive induction approach that learns better representations of amino acid sequences in terms of physical and chemical properties. Expand
Do Microorganisms Have A Macroevolutionary History?
  • C. Kurland
  • Biology
  • The Quarterly Review of Biology
  • 2016
Les mécanismes de repliement des protéines solubles
La fonction d’une proteine est portee par sa structure tridimensionnelle. Les regles de traduction de l’information encodee dans l’ADN en une sequence d’acides amines sont connues. Par contre lesExpand
Methodologies for target selection in structural genomics.
  • M. Linial, G. Yona
  • Biology, Medicine
  • Progress in biophysics and molecular biology
  • 2000
TLDR
This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination, the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds. Expand
Estimating the number of protein folds.
TLDR
It is shown that the number of known non-transmembrane protein folds is approximately one half of the total that exist, and that certain superfolds should exist, which accommodate dozens of non-homologous sequence families. Expand
Protein families in the metazoan genome.
  • C. Chothia
  • Biology, Medicine
  • Development (Cambridge, England). Supplement
  • 1994
TLDR
The rates of progress in the genome sequencing projects, and in protein structure analyses, means that in a few years a fairly complete outline description of the molecules responsible for the structure and function of organisms at several different levels of developmental complexity should make a major contribution to the understanding of the evolution of development. Expand
Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins
TLDR
Surprisingly, the conservation of disordered and structured regions to increase in equal proportion with abundance is found, which implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 10 REFERENCES
A data bank merging related protein structures and sequences.
TLDR
Wedding the primary and tertiary structural data resulted in an 8-fold increase of data bank sequence entries over those associated with the known three-dimensional architectures alone. Expand
A survey of expressed genes in Caenorhabditis elegans
TLDR
The result is the identification of about 1,200 of the estimated 15,000 genes of C. elegans, providing a more accurate estimate of the total number of genes in the organism than has hitherto been available. Expand
Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues
A database containing mapped partial cDNA sequences from Caenorhabdhitis elegans will provide a ready starting point for identifying nematode homologues of important human genes and determining theirExpand
Sequence identification of 2,375 human brain genes
TLDR
2,672 new, independent cDNA clones isolated from four human brain cDNA libraries are partially sequenced to generate 2,375 expressed sequence tags to nuclear-encoded genes, representing an approximate doubling of the number of human genes identified by DNA sequencing and may represent as many as 5% of the genes in the human genome. Expand
The C. elegans genome sequencing project: a beginning
TLDR
The long-term goal of this project is the elucidation of the complete sequence of the Caenorhabditis elegans genome and a strategy implemented that is amenable to large-scale sequencing. Expand
The complete DNA sequence of yeast chromosome III
The entire DNA sequence of chromosome III of the yeast Saccharomyces cerevisiae has been determined. This is the first complete sequence analysis of an entire chromosome from any organism. TheExpand
Database of homology‐derived protein structures and the structural meaning of sequence alignment
TLDR
A database of homology‐derived secondary structure of proteins (HSSP) is produced by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve, effectively increasing the number of known protein structures by a factor of five to more than 1800. Expand
Modular exchange principles in proteins
TLDR
Analysis of the new data helps clarify the mechanism, evolutionary significance and history of exon-shuffling in modular exchange of proteins. Expand
Similarity of the three-dimensional structures of actin and the ATPase fragment of a 70-kDa heat shock cognate protein.
TLDR
A local sequence "fingerprint," which may be diagnostic of the adenine nucleotide beta-phosphate-binding pocket, has been derived and identifies members of the glycerol kinase family as candidates likely to have a similar structure in their nucleotide-binding domains. Expand
The classification and origins of protein folding patterns.
TLDR
Stability and Accessible Surface Area of Protein Folds, Secondary Structures and their Packings, and Chain Topology in Helical Proteins. Expand