Proteome Coverage Prediction for Integrated Proteomics Datasets

@article{Claassen2010ProteomeCP,
  title={Proteome Coverage Prediction for Integrated Proteomics Datasets},
  author={Manfred Claassen and Ruedi Aebersold and Joachim M. Buhmann},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  year={2010},
  volume={18 3},
  pages={
          283-93
        }
}
Comprehensive characterization of a proteome defines a fundamental goal in proteomics. In order to maximize proteome coverage for a complex protein mixture, i.e., to identify as many proteins as possible, various different fractionation experiments are typically performed and the individual fractions are subjected to mass spectrometric analysis. The resulting data are integrated into large and heterogeneous datasets. Proteome coverage prediction refers to the task of extrapolating the number of… 

Design and Validation of Proteome Measurements

  • M. Claassen
  • Biology
    Ausgezeichnete Informatikdissertationen
  • 2010
TLDR
This thesis introduces statistical concepts to optimally design and validate shotgun proteomics experiments and thereby enables to efficiently achieve reliable and extensive proteome coverage.

Absolute quantification of microbial proteomes at different states by directed mass spectrometry

TLDR
This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.

Inference and Validation of Protein Identifications

TLDR
This review aims to survey the different conceptual approaches to the different tasks of inferring and statistically validating protein identifications and to discuss their implications on the scope of proteome exploration.

Generating and navigating proteome maps using mass spectrometry

TLDR
This paper presents a new generation of proteome maps and analytical strategies that use these maps as prior information and shows the potential to greatly enhance the impact of proteomics on biological and clinical research.

Generic Comparison of Protein Inference Engines*

TLDR
This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra.

Machine learning applications in proteomics research: How the past can boost the future

TLDR
An overview of the different applications of machine learning in proteomics that together cover nearly the entire wet‐ and dry‐lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis are presented.

The quantitative proteome of a human cell line

TLDR
This work provides a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis, and shows that these human cultured cells express at least ∼10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell.

References

SHOWING 1-10 OF 22 REFERENCES

Proteome coverage prediction with infinite Markov models

TLDR
An extended infinite Markov model DiriSim is proposed to extrapolate the progression of proteome coverage based on a small number of already performed LC-MS/MS experiments, which explicitly accounts for the uncertainty of peptide identifications.

Improving the success rate of proteome analysis by modeling protein-abundance distributions and experimental designs

TLDR
This approach demonstrates that simple changes in typical experimental designs can enhance the success rate of proteome analysis by five- to tenfold.

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry*

TLDR
The data show that the size of the data set has an important and previously underestimated impact on the reliability of protein identifications, and found that protein false discovery rates are significantly elevated compared with those of peptide-spectrum matches.

Absolute quantification of microbial proteomes at different states by directed mass spectrometry

TLDR
This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.

A high-quality catalog of the Drosophila melanogaster proteome

TLDR
It is shown that high-quality proteomics data provide crucial information to amend genome annotation and to confirm many predicted gene models, and this library of proteotypic peptides should enable fast, targeted and quantitative proteomic studies to elucidate the systems biology of this model organism.

A statistical model for identifying proteins by tandem mass spectrometry.

TLDR
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.

Targeted Quantitative Analysis of Streptococcus pyogenes Virulence Factors by Multiple Reaction Monitoring*S

TLDR
Applying this approach, low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma were reliably quantified and clearly defined the subset of virulence proteins that is regulated upon plasma exposure.

An Integrated, Directed Mass Spectrometric Approach for In-depth Characterization of Complex Peptide Mixtures *S

TLDR
A directed LC-MS/MS approach that alleviates the limitations of DDA precursor ion selection by decoupling peak detection and sequencing of selected precursor ions is presented.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

TLDR
This work clarifies the preferred methodology by addressing four issues based on observed decoy hit frequencies: the major assumptions made with this database search strategy are reasonable, concatenated target-decoy database searches are preferable to separate target and decoydatabase searches, and the theoretical error associated with target-Decoy false positive (FP) rate measurements can be estimated.

Analysis and validation of proteomic data generated by tandem mass spectrometry

TLDR
This review discusses critical issues related to data processing and analysis in proteomics and describes available methods and tools and places special emphasis on the elaboration of results that are supported by sound statistical arguments.