Learn More
We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages(More)
Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found.(More)
Predicting the secondary structure (alpha-helices, beta-sheets, coils) of proteins is an important step towards understanding their three dimensional conformations. Unlike alpha-helices that are built up from one contiguous region of the polypeptide chain, beta-sheets are more complex resulting from a combination of two or more disjoint regions. The exact(More)
The DSSP program automatically assigns the secondary structure for each residue from the three-dimensional co-ordinates of a protein structure to one of eight states. However, discrete assignments are incomplete in that they cannot capture the continuum of thermal fluctuations. Therefore, DSSPcont (http://cubic.bioc.columbia.edu/services/DSSPcont)(More)
Within computational biology, algorithms are constructed with the aim of extracting knowledge from biological data, in particular, data generated by the large genome projects, where gene and protein sequences are produced in high volume. In this article, we explore new ways of representing protein-sequence information, using machine learning strategies,(More)
  • 1