Learn More
Currently the protein mutant database (PMD) contains over 81 000 mutants, including artificial as well as natural mutants of various proteins extracted from about 10 000 articles. We recently developed a powerful viewing and retrieving system (http://pmd.ddbj.nig.ac.jp), which is integrated with the sequence and tertiary structure databases. The system has(More)
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene(More)
A systematic survey of intrinsically disordered (ID) regions was carried out in 2109 human plasma membrane proteins with full assignment of the transmembrane topology with respect to the lipid bilayer. ID regions with 30 consecutive residues or more were detected in 41.0% of the human proteins, a much higher percentage than the corresponding figure (4.7%)(More)
The pattern of amino acid substitutions and sequence conservation over many structure-based alignments of protein sequences was analyzed as a function of percentage sequence identity. The statistics of the amino acid substitutions were converted into the form of log-odds amino acid substitution matrices to which eigenvalue decomposition was applied. It was(More)
The contact number of an amino acid residue in a protein structure is defined by the number of C(beta) atoms around the C(beta) atom of the given residue, a quantity similar to, but different from, solvent accessible surface area. We present a method to predict the contact numbers of a protein from its amino acid sequence. The method is based on a simple(More)
Human transcriptional regulation factors, such as activators, repressors, and enhancer-binding factors are quite different from their prokaryotic counterparts in two respects: the average sequence in human is more than twice as long as that in prokaryotes, while the fraction of sequence aligned to domains of known structure is 31% in human transcription(More)
It is known that in thermophiles the G+C content of ribosomal RNA linearly correlates with growth temperature, while that of genomic DNA does not. Although the G+C contents (singlet) of the genomic DNAs of thermophiles and methophiles do not differ significantly, the dinucleotide (doublet) compositions of the two bacterial groups clearly do. The average(More)
The amino acid compositions of proteins from halophilic archaea were compared with those from non-halophilic mesophiles and thermophiles, in terms of the protein surface and interior, on a genome-wide scale. As we previously reported for proteins from thermophiles, a biased amino acid composition also exists in halophiles, in which an abundance of acidic(More)
Complementary DNAs of two kinesin-related genes,katB andkatC, were isolated fromArabidopsis thaliana and sequenced. The carboxyl-terminal regions of the polypeptides encoded by these genes, especially the presumptive ATP-binding and microtubule-binding domains, share significant sequence homology with the mechanochemical motor domain of the kinesin heavy(More)
Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology to known protein-coding genes. Although pseudogenes were reported to exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157(More)