A common open representation of mass spectrometry data and its application to proteomics research

@article{Pedrioli2004ACO,
  title={A common open representation of mass spectrometry data and its application to proteomics research},
  author={Patrick G. A. Pedrioli and Jimmy K. Eng and Robert M. Hubley and Mathijs Vogelzang and Eric W. Deutsch and Brian Raught and Brian S. Pratt and Erik J. Nilsson and Ruth Hogue Angeletti and Rolf Apweiler and Kei-Hoi Cheung and Catherine E. Costello and Henning Hermjakob and Sequin Huang and Randall K. Julian and Eugene A. Kapp and Mark E. McComb and Stephen G. Oliver and Gilbert S. Omenn and Norman W. Paton and Richard Simpson and Richard D. Smith and Chris F. Taylor and Weimin Zhu and Ruedi Aebersold},
  journal={Nature Biotechnology},
  year={2004},
  volume={22},
  pages={1459-1466}
}
A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new… 
An efficient data format for mass spectrometry-based proteomics
Mass Spectrometer Output File Format mzML
  • E. Deutsch
  • Computer Science, Medicine
    Proteome Bioinformatics
  • 2010
TLDR
This chapter presents the various components and information available for this format, mzML, an open XML-based format for encoding mass spectrometer output files and how to write software to use this format for archiving, sharing, and processing.
multiplierz: an extensible API based desktop environment for proteomics data analysis
BackgroundEfficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms
mzML—a Community Standard for Mass Spectrometry Data*
TLDR
The resulting standard data format, mzML, is a well tested open-source format formass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
mzAPI: a new strategy for efficiently sharing mass spectrometry data
TLDR
This work proposes that a common and redistributable application programming interface (API) represents a more viable approach to data access in mass spectrometry and proposes to shift the burden of standards compliance to the manufacturers’ existing data access libraries.
Data management in mass spectrometry-based proteomics.
  • L. Martens
  • Computer Science, Medicine
    Methods in molecular biology
  • 2011
TLDR
Insight is provided into the specifics of a typical workflow, the data types and user roles involved, and a broad overview of available software solutions for proteomics discovery/validation, by looking at a typical workspace and the increasingly important link between a local data management system and the global, centralized dissemination of proteomics data.
A uniform proteomics MS/MS analysis platform utilizing open XML file formats
TLDR
The Trans‐Proteomic Pipeline is described, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels, and enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a range of different database search programs.
Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry
TLDR
A fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework that implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory.
A Mass Spectrometry Proteomics Data Management Platform*
TLDR
A relational database architecture and accompanying web application is presented that is designed to address the failings of the file-based mass spectrometry data management approach and is designed such that the output of disparate software pipelines may be imported into a core set of unified tables.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 16 REFERENCES
Protein Identification by Mass Spectrometry
  • M. Baldwin
  • Chemistry, Medicine
    Molecular & Cellular Proteomics
  • 2004
TLDR
The paper that follows attempts to highlight the strengths and weaknesses of the methods in current use, as well as establishing criteria for mass spectrometric identification of proteins that should be employed by researchers.
A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry.
TLDR
A software tool for visualizing data obtained from analyzing complex peptide mixtures by liquid chromatography (LC) electrospray ionization (ESI) mass spectrometry (MS) that may have broad application in MS-based proteomics.
Global protein identification and quantification technology using two-dimensional liquid chromatography nanospray mass spectrometry.
TLDR
The difference in the concentrations of several phosphopeptides determined in the authors' studies suggests the possibility of several new targets involved in the EGF cell-signaling pathway.
A statistical model for identifying proteins by tandem mass spectrometry.
TLDR
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.
Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry.
TLDR
The utility of the ASAPRatio program was clearly demonstrated by its speed and the accuracy of the generated protein abundance ratios and by its capability to identify specific core components of the RNA polymerase II transcription complex within a high background of copurifying proteins.
Design and implementation of microarray gene expression markup language (MAGE-ML)
TLDR
MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data
TLDR
A novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide sequence in a database and document the performance of the algorithm on a reference data set and in comparison with another sequence database search tool.
Probability‐based protein identification by searching sequence databases using mass spectrometry data
TLDR
A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry
TLDR
The method and the new software tools to support it are well suited to the large-scale, quantitative analysis of membrane proteins and other classes of proteins that have been refractory to standard proteomics technology.
The need for a public proteomics repository
TLDR
This work states that the availability of DNA microarray data, coupled with public genome sequence data, is arguably one of the primary forces driving computational research in functional genomics.
...
1
2
...