Complete sequencing and characterization of 21,243 full-length human cDNAs

@article{Ota2004CompleteSA,
  title={Complete sequencing and characterization of 21,243 full-length human cDNAs},
  author={Toshio Ota and Yutaka Suzuki and Tetsuo Nishikawa and Tetsuji Otsuki and Tomoyasu Sugiyama and Ryotaro Irie and Ai Wakamatsu and Koji Hayashi and Hiroyuki Sato and Keiichi Nagai and Kouichi Kimura and Hiroshi Makita and Mitsuo Sekine and Masaya Obayashi and Tatsunari Nishi and T. Shibahara and Toshihiro Tanaka and Shizuko Ishii and Jun-ichi Yamamoto and Kaoru Saito and Yuri Kawai and Yuko Isono and Yoshitaka Nakamura and Kenji Nagahari and Katsuhiko S. Murakami and Tomohiro Yasuda and Takao Iwayanagi and Masako Wagatsuma and Akiko Shiratori and Hiroaki Sudo and Takehiko Hosoiri and Yoshiko Kaku and Hiroyo Kodaira and Hiroshi Kondo and M Sugawara and Makiko Takahashi and Katsuhiro Kanda and Takahide Yokoi and Takako Furuya and Emiko Kikkawa and Yuhi Omura and Kumiko Abe and Kumiko Kamihara and Naoko Katsuta and Kazuo Sato and Machiko Tanikawa and Makoto Yamazaki and Kenji Ninomiya and Tadashi Ishibashi and Hiromichi Yamashita and Katsuji Murakawa and Kiyoshi Fujimori and Hiroyuki Tanai and Manabu Kimata and Motoji Watanabe and Susumu Hiraoka and Yoshiyuki Chiba and Shinichi Ishida and Yukio Ono and Sumiyo Takiguchi and Susumu Watanabe and Makoto Yosida and T. Hotuta and Junko Kusano and Keiichi Kanehori and Asako Takahashi-Fujii and Hiroto Hara and Tomo-o Tanase and Yoshiko Nomura and Sakae Togiya and Fukuyo Komai and Reiko Hara and Kazuha Takeuchi and M. Arita and Nobuyuki Imose and Kaoru Musashino and Hisatsugu Yuuki and Atsushi Oshima and Naokazu Sasaki and Satoshi Aotsuka and Yoko Yoshikawa and Hiroshi Matsunawa and Tatsuo Ichihara and Namiko Shiohata and Sanae Sano and Shogo Moriya and H Momiyama and Noriko Satoh and Sachiko Takami and Y. Terashima and Osamu Suzuki and Satoshi Nakagawa and Akihiro Senoh and Hiroshi Mizoguchi and Yoshihiro Goto and F. Shimizu and Hirokazu Wakebe and Haretsugu Hishigaki and Takeshi K. Watanabe and Akio Tsuruga-shi Sugiyama and Makoto Takemoto and Bunsei Kawakami and Masaaki Yamazaki and Koji Watanabe and Ayako Kumagai and Shoko Itakura and Yasuhito Fukuzumi and Yoshifumi Fujimori and Megumi Mitsubishi-Kas Komiyama and Hiroyuki Tashiro and Akira Tanigami and Tsutomu Fujiwara and Toshihide Ono and Koichi Yamada and Yuka Fujii and Kouichi Ozaki and Maasa Hirao and Yoshihiro Ohmori and Ayako Kawabata and Takeshi Hikiji and Naoko Kobatake and Hiromichi Inagaki and Yasuko Ikema and Sachiko Okamoto and Rie Okitani and Takuma Kawakami and Saori Noguchi and Tomoko Itoh and Keiko Shigeta and Tadashi Senba and Kyoka Matsumura and Yoshie Nakajima and T Mizuno and Misato Morinaga and Masahide Sasaki and Takushi Togashi and Masaaki Oyama and Hiroko Hata and Manabu Watanabe and Takami Komatsu and Junko Mizushima-Sugano and Tadashi Satoh and Yuko Shirai and Yukiko Y. Takahashi and Kiyomi Nakagawa and K Okumura and Takahiro Nagase and Nobuo Nomura and Hisashi Kikuchi and Yasuhiko Masuho and Riu Yamashita and Kenta Nakai and Tetsushi Yada and Yusuke Nakamura and Osamu Ohara and Takao Isogai and Sumio Sugano},
  journal={Nature Genetics},
  year={2004},
  volume={36},
  pages={40-45}
}
As a base for human transcriptome and functional genomics, we created the “full-length long Japan” (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with… 

Classification and characterization of human full-length cDNA clones that are difficult to sequence

In the Full-length Human cDNA Sequencing Project, 30,160 cDNA were sequenced. Among them, our group performed sequencing of 3,588 cDNAs, mainly using the primer walking method. The sequences achieved

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

TLDR
An exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes is performed, providing an unequivocal report of structural and functional diversity at the gene level.

Preparation of a set of expression-ready clones of mammalian long cDNAs encoding large proteins by the ORF trap cloning method.

TLDR
A new method for ORF cloning based on a homologous recombination in Escherichia coli to avoid laborious manipulations and artificial introduction of mutations in ORF is developed and successfully converted original cDNA clones to expression-ready forms for native and fusion proteins.

Identification and Functional Analyses of 11 769 Full-length Human cDNAs Focused on Alternative Splicing

TLDR
From the results of the FLJ Human cDNA Database, it has been understood mechanisms that one gene produces suitable protein-coding transcripts responding to the situation and the environment.

Correction: Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

TLDR
An exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes is performed, providing an unequivocal report of structural and functional diversity at the gene level.

The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

TLDR
Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors.

Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

TLDR
The findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.

Fine Expression Profiling of Full-length Transcripts using a Size-unbiased cDNA Library Prepared with the Vector-capping Method

TLDR
The results suggest that the size-unbiased full-length cDNA library constructed using the vector-capping method will be an ideal resource for fine expression profiling of transcriptional variants with alternative TSSs and alternative splicing.

Database for chicken full-length cDNAs.

TLDR
The development of a chicken full-length cDNA database is introduced, which will facilitate future research work in this biological system and will be useful for animal science and veterinary researchers wishing to clone and to confirm full- lengths chicken cDNAs of interest.
...

References

SHOWING 1-10 OF 31 REFERENCES

Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.

TLDR
The sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame are reported, concluding that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes.

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs

TLDR
The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1 (supplement).

  • N. NomuraN. Miyajima S. Tabata
  • Biology
    DNA research : an international journal for rapid publication of reports on genes and genomes
  • 1994
We established a protocol for the prediction of the coding sequences of unidentified human genes based on the double selection and sequence analysis of cDNA clones with inserts carrying unreported

Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1.

  • N. NomuraN. Miyajima S. Tabata
  • Biology
    DNA research : an international journal for rapid publication of reports on genes and genomes
  • 1994
We established a protocol for the prediction of the coding sequences of unidentified human genes based on the double selection and sequence analysis of cDNA clones with inserts carrying unreported

Characterization of long cDNA clones from human adult spleen.

TLDR
The characterization of cDNA clones from human adult spleen indicated that spleen could be used as an additional source of human long cDNAs to complement the list of human genes.

The DNA sequence and comparative analysis of human chromosome 20

TLDR
Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.

HUNT: launch of a full-length cDNA database from the Helix Research Institute

TLDR
The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database, which represents an essential bioinformatics contribution towards understanding of the gene function.

The DNA sequence of human chromosome 22

TLDR
The sequence of the euchromatic part of human chromosome 22 is reported, which consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.

Evaluation of gene structure prediction programs.

TLDR
The results indicated that the predictive accuracy of the programs analyzed was lower than originally found, which indicates that the programs are overly dependent on the particularities of the examples they learn from.