Learn More
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction(More)
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein(More)
The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the(More)
BACKGROUND Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated, or replaced by zero or estimated by(More)
MOTIVATION The object of this study is to propose a new method to identify small compact units that compose protein three-dimensional structures. These fragments, called 'protein units (PU)', are a new level of description to well understand and analyze the organization of protein structures. The method only works from the contact probability matrix, i.e.(More)
Protein Blocks (PBs) comprise a structural alphabet of 16 protein fragments, each 5 Calpha long. They make it possible to approximate and correctly predict local protein three-dimensional (3D) structures. We have selected the 72 most frequent sequences of five PBs, which we call Structural Words (SWs). Analysis of four different protein data banks shows(More)
UNLABELLED PredAcc is a tool for predicting the solvent accessibility of protein residues from the sequence at different relative accessibility levels (0-55%). The prediction rate varies between 70. 7% (for 25% relative accessibility) and 85.7% (for 0% relative accessibility). Amino acids are predicted in four categories: almost certainly hidden and almost(More)
A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to(More)
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a(More)