proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data*

  title={proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data*},
  author={Xiaojing Wang and Robbert J. C. Slebos and Matthew Chambers and David L. Tabb and Daniel C. Liebler and Bing Zhang},
  journal={Molecular \& Cellular Proteomics : MCP},
  pages={1164 - 1175}
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)1 within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its… 

Figures from this paper

The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data

Two novel standard data formats, proBAM and proBed, are introduced to address the current challenges of integrating mass spectrometry-based proteomics data with genomics and transcriptomics information in proteogenomics studies.

A tool for integrating genetic and mass spectrometry‐based peptide data: Proteogenomics Viewer

Proteogenomics Viewer is described, a web‐based tool that collects MS peptide identification, indexes to genomic sequence and structure, assigns exon usage, reports the identified protein isoforms with genomic alignments and, most importantly, allows the inspection of MS2 information for proper peptides identification.

An Accessible Proteogenomics Informatics Resource for Cancer Researchers.

This resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: generation of customized, annotated protein sequence databases from RNA-Seq data; and accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty.

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

PoGo is developed to map peptides with associated post-translational modifications and quantification to reference genome annotation with publicly available datasets of quantitative and phosphoproteomics, as well as large-scale studies.

proBAMconvert: A Conversion Tool for proBAM/proBed.

ProBAMconvert enables the conversion of common identification file formats (mzIdentML, mzTab, and pepXML) to proBAM/proBed using an intuitive interface and has a command line interface next to the graphical user interface.

Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology

In this review, how genomics and transcriptomics data in different formats can be utilized to assist proteogenomics application is briefly discussed and how protegenomics can be applied to tackle biological problems is discussed.

Methods, Tools and Current Perspectives in Proteogenomics *

This article systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization.



PGTools: A Software Suite for Proteogenomic Data Analysis and Visualization.

A subset of proteogenomic peptides in human PANC-1 cells are experimentally confirmed and the utility of PGTools is demonstrated using a colorectal cancer data set that led to the identification of 203 novel protein coding regions missed by conventional proteomic approaches.

Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing.

The PG Nexus pipeline facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding, and other examples of relevance to the Chromosome-Centric Human Proteome Project.

Proteogenomics: concepts, applications and computational strategies

The current state of proteogenomic methods and applications are reviewed, including computational strategies for building and using customized protein sequence databases, and the challenge of false positive identifications are drawn attention.

Integrating Genomic, Transcriptomic, and Interactome Data to Improve Peptide and Protein Identification in Shotgun Proteomics

The major integrative bioinformatics approaches that have been developed during the past decade are surveyed and their merits and demerits are discussed.

ProteoAnnotator – Open source proteogenomics annotation software supporting PSI standards

A complete, open source pipeline for proteogenomics is introduced, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage, and new modules for pre‐processing and combining multiple search databases are developed.

BEDTools: a flexible suite of utilities for comparing genomic features

A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions

BackgroundProteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome.

customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search

An R package customProDB is reported that enables the easy generation of customized databases from RNA-Seq data for proteomics search and bridges genomics and proteomics studies and facilitates cross-omics data integration.

Mass-spectrometry-based draft of the human proteome

A mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB are presented, which enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.

A Cross-platform Toolkit for Mass Spectrometry and Proteomics

The ProteoWizard Toolkit is developed, a robust set of open-source, software libraries and applications designed to facilitate proteomics research that implements the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats.