The shrinking human protein coding complement: are there now fewer than 20,000 genes?

@article{Ezkurdia2013TheSH,
  title={The shrinking human protein coding complement: are there now fewer than 20,000 genes?},
  author={Iakes Ezkurdia and David de Juan and Jose Manuel Rodriguez and Adam Frankish and Mark E. Diekhans and Jennifer L. Harrow and Jes{\'u}s V{\'a}zquez and Alfonso Valencia and Michael L. Tress},
  journal={bioRxiv},
  year={2013}
}
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein coding potential is the detection of cellular protein expression through peptide mass spectrometry experiments. Here we map the peptides detected in 7 large-scale proteomics studies to almost 60% of the protein coding genes in the GENCODE annotation the human genome. We find that conservation across vertebrate species and the age of the gene family are… Expand
The contribution of alternative splicing probability to the coding expansion of the genome
TLDR
A splice-site-centric quantification method is developed, allowing to characterize transcriptome-wide alternative splicing with a simple probabilistic model, enabling species-wide comparison and suggesting that dominant isoforms are co-expressed alongside many minor isoforms. Expand
A Human "eFP" Browser for Generating Gene Expression Anatograms
TLDR
The Human eFP (“electronic Fluorescent Pictograph”) Browser presented here is a tool for intuitive visualization of large human gene expression data sets on pictographic representations of the human body as gene expression “anatograms”. Expand
Long Non-Coding RNAs as Master Regulators in Cardiovascular Diseases
TLDR
The prevalence of circulating lncRNAs is described and their potential utilities as biomarkers for diagnosis and prognosis of heart disease are assessed. Expand
Plant aquaporin regulation: Structural and functional studies using diffraction and scattering techniques
Water is the basis for life as we know it. It is only logical then that all organisms have evolved specialized proteins, aquaporins, that regulate water flow across their membranes. Plants, which areExpand
X-ray crystallography over the past decade for novel drug discovery – where are we heading next?
TLDR
This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination and how a combination of structural and biochemical/biophysical methods may improve the understanding of biological processes and interactions. Expand
Analysis of Gene Expression Time Series Data of Ebola Vaccine response using the NeuCube and Temporal Feature Selection
TLDR
A promising temporal feature selection method was tested using the NeuCube for classification against a set of previously identified genes using a dataset from Ebola vaccine trials, and discovered gene markers and their corresponding gene interaction network (GIN) are new and have not been published before. Expand
Transcriptome analysis of the human corneal endothelium.
TLDR
At least nine genes demonstrated significant differential expression between pediatric and adult HCEnC, defining specific functional properties distinct to each age group, and can be used to focus the search for the genetic basis of the corneal endothelial dystrophies for which the Genetic basis remains unknown. Expand
Proteomics in cancer research: Are we ready for clinical practice?
TLDR
An overview of the transition of oncoproteomics towards translational oncology is provided, which lessons are learned from currently approved protein biomarkers and previous proteomic studies, what the pitfalls and challenges are in clinical proteomics applications, and how proteomic research can be successfully translated into medical practice are described. Expand
The Role of Long Non-Coding RNAs in Atrial Fibrillation.
TLDR
The role of lncRNAs in atrial fibrillation and its pathogenesis is discussed and the altered expression of lNCRNAs offers genetic targets for the diagnosis and treatment of AF. Expand
Computational Modelling and Pattern Recognition in Bioinformatics
This chapter explores the ability of SNN to capture changes in Bioinformatics data for predicting events or classifying biological states from DNA, gene and protein data. It starts with aExpand
...
1
2
...

References

SHOWING 1-10 OF 71 REFERENCES
Distinguishing protein-coding and noncoding genes in the human genome
TLDR
It is shown that the vast majority of nonconserved ORFs present by chance in RNA transcripts are random occurrences, and the results indicate that there has been relatively little true innovation in mammalian protein-coding genes. Expand
Improving gene annotation using peptide mass spectrometry.
TLDR
By searching a corpus of 18.5 million tandem mass spectra from human proteomic samples, this work validate 39,000 exons and 11,000 introns at the level of translation and demonstrates that proteomic profiling should play a role in any genome sequencing project. Expand
Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function
TLDR
This work identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. Expand
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
TLDR
A novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time is presented. Expand
GENCODE: the reference human genome annotation for The ENCODE Project.
TLDR
This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. Expand
The state of the human proteome in 2012 as viewed through PeptideAtlas.
TLDR
It is found that this latest PeptideAtlas build includes at least one peptide for each of ~12500 Swiss-Prot entries, leaving ~7500 gene products yet to be confidently cataloged, and characterize these "PA-unseen" proteins in terms of tissue localization, transcript abundance, and Gene Ontology enrichment. Expand
Quantifying the mechanisms of domain gain in animal proteins
TLDR
The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes. Expand
A high-resolution map of human evolutionary constraint using 29 mammals
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at leastExpand
Deep proteome and transcriptome mapping of a human cancer cell line
TLDR
Comparisons of the proteome and the transcriptome, and analysis of protein complex databases and GO categories, suggest that deep coverage of the functional transcriptome andThe proteome of a single cell type is achieved. Expand
Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins*
TLDR
This work analyzes 11 human cell lines using an LTQ-Orbitrap family mass spectrometer with a “high field” Orbitrap mass analyzer with improved resolution and sequencing speed to construct a broad coverage “super-SILAC” quantification standard. Expand
...
1
2
3
4
5
...