Analysis of expressed sequence tags indicates 35,000 human genes

@article{Ewing2000AnalysisOE,
  title={Analysis of expressed sequence tags indicates 35,000 human genes},
  author={Brent Ewing and Phil Green},
  journal={Nature Genetics},
  year={2000},
  volume={25},
  pages={232-234}
}
The number of protein-coding genes in an organism provides a useful first measure of its molecular complexity. Single-celled prokaryotes and eukaryotes typically have a few thousand genes; for example, Escherichia coli has 4,300 and Saccharomyces cerevisiae has 6,000. Evolution of multicellularity appears to have been accompanied by a several-fold increase in gene number, the invertebrates Caenorhabditis elegans and Drosophila melanogaster having 19,000 and 13,600 genes, respectively. Here we… 
Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans
TLDR
Experimental evidence is provided that supports the existence of at least 17,300 genes in C. elegans, suggesting that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.
Assessment of the total number of human transcription units.
TLDR
It is indicated that at least 5000-9000 additional human genes which lack similarity to known genes or proteins exist in the human genome, increasing baseline gene estimates to approximately 41,000-45,000.
Genome-wide detection of alternative splicing in expressed sequences of human genes
TLDR
The data indicate that a large proportion of human genes, probably 42% or more, are alternatively spliced, but that this appears to be observed mainly in certain types of molecules (e.g. cell surface receptors) and systemic functions, particularly the immune system and nervous system.
Estimating the Number of Mouse Genes and the Duplicated Regions within the Mouse Genome
TLDR
This study estimated the number of mouse genes using expressed sequencetags of full-length cDNA library and a set of genes obtained by clustering mRNA sequences from DDBJ/EMBL/GenBank and the duplicated chromosomal regions within the mouse genome using the map information derived from the Mouse Genome Database and the numeroushomologous gene pairs.
Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
TLDR
The large-scale distribution of RP pseudogenes throughout the genome appears to result, chiefly, from random insertions with the numbers on each chromosome, consequently, proportional to its size, with the highest density in GC-intermediate regions of the genome.
Protein-Coding and Noncoding RNA Genes
TLDR
This chapter describes the current view of human protein-coding and noncoding RNA genes, and illustrates how alternative splicing and other mechanisms diversify human proteome.
Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs
TLDR
The sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame were reported, finding a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted.
Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.
TLDR
The sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame are reported, concluding that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes.
Most of the human genome is transcribed.
TLDR
It is argued that the newer but smaller gene counts are closer to being correct than the older but larger gene counts, because the mean gene size is so big that there is little room for intergenic DNA, and most of the human genome is transcribed.
The transcriptional activity of human Chromosome 22.
TLDR
A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA and revealed twice as many transcribed bases as have been reported previously.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
The genome sequence of Drosophila melanogaster.
TLDR
The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Life with 6000 Genes
TLDR
The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration and provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history.
A survey of expressed genes in Caenorhabditis elegans
TLDR
The result is the identification of about 1,200 of the estimated 15,000 genes of C. elegans, providing a more accurate estimate of the total number of genes in the organism than has hitherto been available.
The complete genome sequence of Escherichia coli K-12.
TLDR
The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Ancient conserved regions in new gene sequences and the protein databases.
TLDR
Nearly all of the ACRs identified were found to be homologous to sequences in the protein databases, suggesting that currently known proteins may already include representatives of most ACRs and that new sequences not similar to any database sequence are unlikely to contain ACRs.
A comparison of expressed sequence tags (ESTs) to human genomic sequences.
TLDR
An analysis of >1000 ESTs generated by the WashU-Merck EST project finds that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites, suggesting that ESTs could provide researchers with novel information about alternative splicing in certain genes.
Number of CpG islands and genes in human and mouse.
  • F. Antequera, A. Bird
  • Biology, Medicine
    Proceedings of the National Academy of Sciences of the United States of America
  • 1993
TLDR
Analysis of a selection of genes suggests that both human and mouse are losing CpG islands over evolutionary time due to de novo methylation in the germ line followed by C pG loss through mutation, which appears to be more rapid in rodents.
Generation and analysis of 280,000 human expressed sequence tags.
TLDR
Comparisons of a subset of the data with nonredundant human mRNA and protein data bases show that the ESTs represent many known sequences and contain many that are novel, which supports the contention that although normalization reduces significantly the relative abundance of redundant cDNA clones, it does not result in the complete removal of members of gene families.
Genome sequence of the nematode C. elegans: a platform for investigating biology.
TLDR
The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes and the distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.
The DNA sequence of human chromosome 22
TLDR
The sequence of the euchromatic part of human chromosome 22 is reported, which consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.
...
1
2
3
...