DNA Data Bank of Japan in the age of information biology

  title={DNA Data Bank of Japan in the age of information biology},
  author={Yoshio Tateno and Takashi Gojobori},
  journal={Nucleic acids research},
  volume={25 1},
DNA Data Bank of Japan (DDBJ) began its activities in 1986 in collaboration with EMBL in Europe and GenBank in the United States. DDBJ developed a data submission tool called Sakura, by which researchers can submit their newly sequenced data on WWW from every corner of the world. The data bank also built a database management system (Yamato II), incorporating the techniques and functions of the object-oriented database, in order to efficiently process the data it has collected. A number of… 

Figures and Tables from this paper

DNA Data Bank of Japan (DDBJ) in XML
The complete genome sequence database, Genome Information Broker (GIB), has been improved by incorporating XML and it is now possible to perform a more sophisticated database search against the new GIB than the ordinary BLAST or FASTA search.
Biological databases at DNA Data Bank of Japan in the era of next-generation sequencing technologies.
The public biological databases at CIB-DDBJ, EBI, and NCBI will together construct world-wide archives for biological data by data sharing to accelerate research in life sciences in the era of next generation sequencing technologies.
Biological Data Centres
Three main DNA sequence databases exist as aninternational consortium, the DDBJ (DNA Data Bank of Japan), the EBI (European Bioinformatics Institute)and the NCBI (National Centre for BiotechnologyInformation).
The GenBank sequence database.
GenBank, the National Institutes of Health's genetic sequence database, is an annotated collection of all publicly available nucleotide and protein sequences, which are grouped into divisions; some of these divisions are phylogenetically based, whereas others are based on the tech-nical approach that was used to generate the sequence information.
[Development of information biology].
  • Y. Tateno
  • Chemistry
    Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme
  • 1995
A history of discoveries of a gene and DNA was viewed with respect to people, time and places and elucidated their coordinated biological functions in 1950’s and 1960’.
Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library
A new annotator's workbench named Yamato II and a World Wide Web submission system named Sakura have been successfully developed to improve drastically daily transactions in the DDBJ.
Assembly and Analysis of Extended Human Genomic Contig Regions
This work attempts to collate a definitive set of nonredundant extended segments of human genomic sequence by taking individual human entries in GenBank greater than 25 kilobases (kb) and extending them on either end.
Big Data in Bioinformatics
The advances in modern bioinformatics related to the emergence of highperformance sequencing platforms are discussed, which not only contributed to the expansion of capabilities of biology and related sciences, but also gave rise to the phenomenon of Big Data in biology.
Databases and Data Mining
An historical perspective on the origin of these resources is described, as well as how they are expected to change and grow in the future.
Issues in developing integrated genomic databases and application to the human X chromosome
IXDB, the Integrated X chromosome database, is developed, which fulfils a number of requirements and aims at providing a global view on genomic data at a chromosomal level and represents a conceptual framework based on identifying, storing and analysing relationships between biological objects.


Condon usage tabulated from the international DNA sequence databases
The list of codon usage of genes in organisms was made searchable by name of organism through a web site and the compilation has been synchronized with a major release of GenBank.
Rapid and sensitive protein similarity searches.
An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases and increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution.
Identification of protein coding regions by database similarity search
The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step and was characterized as appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1988
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Complementary DNA sequencing: expressed sequence tags and human genome project
Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs), which will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing.
The Minimal Gene Complement of Mycoplasma genitalium
Comparison of the Mycoplasma genitalium genome to that of Haemophilus influenzae suggests that differences in genome content are reflected as profound differences in physiology and metabolic capacity between these two organisms.
Unified approach to alignment and phylogenies.
  • J. Hein
  • Biology
    Methods in enzymology
  • 1990
Basic local alignment search tool.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.