Learn More
TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to(More)
SUMMARY DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. It includes detailed annotations of each TF including sequence features, functional domains, Gene Ontology assignment, chromosomal localization, EST(More)
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D(More)
Window profiles of amino acids in protein sequences are used to describe the amino acid environment. The relative entropy or Kullback-Leibler distance derived from these profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are(More)
The primitive data for deducing the Miyazawa-Jernigan contact energy or blocks substitution matrix (BLOSUM) consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such a conditional probability from random background, a scheme for the reduction of the amino acid alphabet is(More)
CLEMAPS is a tool for multiple alignment of protein structures. It distinguishes itself from other existing algorithms for multiple structure alignment by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of three angles formed by C(alpha) pseudobonds of four(More)
BACKGROUND Native structures of proteins are formed essentially due to the combining effects of local and distant (in the sense of sequence) interactions among residues. These interaction information are, explicitly or implicitly, encoded into the scoring function in protein structure prediction approaches--threading approaches usually measure an alignment(More)
By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations(More)
By using a mixture model for the density distribution of the three pseudobond angles formed by Cα atoms of four consecutive residues, the local structural states are discretized as 17 conformational letters of a protein structural alphabet. This coarse-graining procedure converts a 3D structure to a 1D code sequence. A substitution matrix between these(More)