Mass-spectrometry-based draft of the human proteome

  title={Mass-spectrometry-based draft of the human proteome},
  author={Mathias Wilhelm and Judith Schlegl and Hannes Hahne and Amin Moghaddas Gholami and M. Lieberenz and Mikhail M. Savitski and Emanuel Ziegler and Lars Butzmann and Siegfried Gessulat and Harald Marx and Toby Mathieson and Simone Lemeer and Karsten Schnatbaum and Ulf Reimer and Holger Wenschuh and Martin Mollenhauer and Julia B. Slotta-Huspenina and Joos-Hendrik Boese and Marcus Bantscheff and Anja Gerstmair and Franz Faerber and Bernhard Kuster},
Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell… 

Systematic detection of functional proteoform groups from bottom-up proteomic datasets

A novel, data-driven strategy to assign peptides to unique functional proteoform groups based on peptide correlation patterns across large bottom-up proteomic datasets is presented, which enabled the systematic detection and evaluation of assembly specific proteoforms at an unprecedented scale.

Proteomic Profiling of the Human Tissue and Biological Fluid Proteome.

This work performed label-free liquid chromatography coupled to tandem MS (LC-MS/MS) to profile the normal human proteome and generated tandem mass spectra corresponding to 13,028 unique human protein-coding genes, which did not accomplish complete proteome coverage.

ProteomeGenerator: A framework for comprehensive proteomics based on de novo transcriptome assembly and high-accuracy peptide mass spectral matching

ProteomeGenerator is reported, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-resolution andhigh-accuracy mass spectrometry proteomics, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing and improved accuracy of genome-scale proteome discovery.

ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching.

ProteomeGenerator, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-accuracy mass spectrometry proteomics, is reported, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing as well as improved accuracy of genome-scale proteome discovery.

Mass spectrometry-based draft of the mouse proteome.

A quantitative draft of the mouse proteome and phosphoproteome constructed from 41 healthy tissues is presented and several lines of analyses exemplify which insights can be gleaned from the data.

Multiplexed Quantitative Proteomics for High-Throughput Comprehensive Proteome Comparisons of Human Cell Lines.

The use of multiplexed quantitative proteomics using isobaric labeling with tandem mass tags (TMT) for the simultaneous quantitative analysis of five cancer cell proteomes in biological duplicates in one mass spectrometry experiment is described.

Proteomics beyond large-scale protein expression analysis.

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

The field of human proteogenomics is reviewed, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

The Human Protein Atlas – an important resource for basic and clinical research

The transcriptomics data generated by the Human Protein Atlas project were compared with other available human transcriptome resources focusing on protein-coding genes, and it is suggested that next generation sequencing of the transcriptome is an attractive tool for indirect measurements of protein expression.

Mass-spectrometry-based near-complete draft of the Saccharomyces cerevisiae proteome

This work generates the largest yeast proteome dataset, including 5610 identified proteins using a strategy based on optimized sample preparation and high-resolution mass spectrometry, which achieves near complete coverage of the yeast ORFs.



Mapping Intact Protein Isoforms in Discovery Mode Using Top Down Proteomics

Identification of 1,043 gene products from human cells that are dispersed into more than 3,000 protein species created by post-translational modification, RNA splicing and proteolysis is shown, using a new four-dimensional separation system.

Initial Quantitative Proteomic Map of 28 Mouse Tissues Using the SILAC Mouse*

A computational framework is described with which to correlate proteome profiles with physiological functions of the tissue and it is shown that physiologically related tissues clustered together and that highly expressed proteins represented the characteristic tissue functions.

Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins*

This work analyzes 11 human cell lines using an LTQ-Orbitrap family mass spectrometer with a “high field” Orbitrap mass analyzer with improved resolution and sequencing speed to construct a broad coverage “super-SILAC” quantification standard.

Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics*

A quantitative transcriptomics analysis (RNA-Seq) is used to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody-based profiling of the same tissues.

Computational prediction of proteotypic peptides for quantitative proteomics

Using >600,000 peptide identifications generated by four proteomic platforms, it is shown that characteristic physicochemical properties of these peptides were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism, for a given platform, with >85% cumulative accuracy.

Identification of missing proteins in the neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the Chromosome-centric Human Proteome Project.

The in-depth phosphoproteomic study represents a significant contribution to C-HPP and identifies 3,033 "missing proteins", i.e., proteins that currently lack evidence by mass spectrometry, in the neXtProt database and 12,852 unknown phosphorylation sites not registered in the PhosphoSitePlus database.

PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life*

This work introduces a meta-resource dedicated to integrating information on absolute protein abundance levels, and places particular emphasis on deep coverage, consistent post-processing and comparability across different organisms.

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry*

The data show that the size of the data set has an important and previously underestimated impact on the reliability of protein identifications, and found that protein false discovery rates are significantly elevated compared with those of peptide-spectrum matches.