One thousand families for the molecular biologist

  title={One thousand families for the molecular biologist},
  author={Cyrus Chothia},

Bayesian local false discovery rate for sparse count data with application to the discovery of hotspots in protein domains

Biostatistics Group, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia ∗, Department of

Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds

It is discussed how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa.

The protein folds as platonic forms: new support for the pre-Darwinian conception of evolution by natural law.

It is speculated that it is unlikely that the folds will prove to be the only case in nature where a set of complex organic forms is determined by natural law, and suggested that natural law may have played a far greater role in the origin and evolution of life than is currently assumed.

Designability of protein structures: A lattice‐model study using the Miyazawa‐Jernigan matrix

Highly designable structures in the HP model are also highly designable in the MJ model—and vice versa—with the associated sequences having enhanced thermodynamic stability.

Understanding hierarchical protein evolution from first principles.

A "profile" solution is provided to the model that explains the hierarchical organization of proteins in fold families and agreement is found between predicted patterns of conserved amino acids and those actually observed in nature.

Global optimum protein threading with gapped alignment and empirical pair score functions.

A branch-and-bound search algorithm for finding the exact global optimum gapped sequence-structure alignment between a protein sequence and a protein core or structural model using an arbitrary amino acid pair score function, which should prove useful for structure prediction and critical evaluation of new pair score functions.

Constructive Induction and Protein Tertiary Structure Prediction

The learning method combines knowledge and search to shift the representation of sequences so that semantic similarity is more easily recognized by syntactic matching, and presents a novel constructive induction approach that learns better representations of amino acid sequences in terms of physical and chemical properties.

Protein families in the metazoan genome.

  • C. Chothia
  • Biology
    Development (Cambridge, England). Supplement
  • 1994
The rates of progress in the genome sequencing projects, and in protein structure analyses, means that in a few years a fairly complete outline description of the molecules responsible for the structure and function of organisms at several different levels of developmental complexity should make a major contribution to the understanding of the evolution of development.

Do Microorganisms Have A Macroevolutionary History?

  • C. Kurland
  • Biology
    The Quarterly Review of Biology
  • 2016



Similarity of the three-dimensional structures of actin and the ATPase fragment of a 70-kDa heat shock cognate protein.

A local sequence "fingerprint," which may be diagnostic of the adenine nucleotide beta-phosphate-binding pocket, has been derived and identifies members of the glycerol kinase family as candidates likely to have a similar structure in their nucleotide-binding domains.

A data bank merging related protein structures and sequences.

Wedding the primary and tertiary structural data resulted in an 8-fold increase of data bank sequence entries over those associated with the known three-dimensional architectures alone.

The classification and origins of protein folding patterns.

Stability and Accessible Surface Area of Protein Folds, Secondary Structures and their Packings, and Chain Topology in Helical Proteins.

The complete DNA sequence of yeast chromosome III

The entire DNA sequence of chromosome III of the yeast Saccharomyces cerevisiae has been determined. This is the first complete sequence analysis of an entire chromosome from any organism. The

The C. elegans genome sequencing project: a beginning

The long-term goal of this project is the elucidation of the complete sequence of the Caenorhabditis elegans genome and a strategy implemented that is amenable to large-scale sequencing.

Sequence identification of 2,375 human brain genes

2,672 new, independent cDNA clones isolated from four human brain cDNA libraries are partially sequenced to generate 2,375 expressed sequence tags to nuclear-encoded genes, representing an approximate doubling of the number of human genes identified by DNA sequencing and may represent as many as 5% of the genes in the human genome.

Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues

A database containing mapped partial cDNA sequences from Caenorhabdhitis elegans will provide a ready starting point for identifying nematode homologues of important human genes and determining their

A survey of expressed genes in Caenorhabditis elegans

The result is the identification of about 1,200 of the estimated 15,000 genes of C. elegans, providing a more accurate estimate of the total number of genes in the organism than has hitherto been available.

Database of homology‐derived protein structures and the structural meaning of sequence alignment

A database of homology‐derived secondary structure of proteins (HSSP) is produced by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve, effectively increasing the number of known protein structures by a factor of five to more than 1800.