Probability‐based protein identification by searching sequence databases using mass spectrometry data

  title={Probability‐based protein identification by searching sequence databases using mass spectrometry data},
  author={David N. Perkins and Darryl J. Pappin and David M. Creasy and John S. Cottrell},
Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three… 

Defining parameters for homology-tolerant database searching.

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database and 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide.

A novel scoring schema for peptide identification by searching protein sequence databases using tandem mass spectrometry data

A reliable identification of proteins from the spectra promises a more efficient application of tandem mass spectrometry to proteomes with high complexity.

Defining Parameters for Homology-Tolerant Database

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database and 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide.

Mass Spectrometry-Based Protein Identification by Correlation with Sequence Database

The progress in genome sequencing projects of a large number of organisms and the advance in mass spectrometry of protein analysis have been significant driving forces in the formation of the field

Randomized sequence databases for tandem mass spectrometry peptide and protein identification.

The use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins will allow researchers to set criteria and thresholds to achieve a desired error rate.

ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data

A novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide sequence in a database and document the performance of the algorithm on a reference data set and in comparison with another sequence database search tool.

Protein identification by tandem mass spectrometry and sequence database searching.

The process of inferring the identities of the sample proteins given the list of peptide identifications is outlined, and the limitations of shotgun proteomics with regard to discrimination between protein isoforms are discussed.

Overview of Tandem Mass Spectrometry (MS/MS) Database Search Algorithms

This unit focuses on the most widely used tandem MS peptide identification search algorithms (commercial and open source), their availability, ease of use, strengths, speed and scoring, as well as their relative sensitivity and specificity.

PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry.

A new de novo sequencing software package, PEAKS, is described, to extract amino acid sequence information without the use of databases, using a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum.



Database searching using mass spectrometry data

  • J. Yates
  • Biology, Chemistry
  • 1998
Computer algorithms have been developed to use the two different types of data generated by mass spectrometers to search sequence databases, including databases of translated protein sequences as well as nucleotide databases such as expressed sequence tag (EST) sequences.

Use of mass spectrometric molecular weight information to identify proteins in sequence databases.

As the size of DNA and protein sequence databases grows, protein identification by partial mass spectrometric peptide maps should become increasingly powerful and may become a general method to identify and characterize proteins.

Peptide mass maps: a highly informative approach to protein identification.

A computer searching algorithm has been used to identify protein sequences in the Protein Information Resource (PIR) database with peptide mass information (mass map) obtained from proteolytic

Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases.

A rapid method for the identification of known proteins separated by two-dimensional gel electrophoresis is described in which molecular masses of peptide fragments are used to search a protein sequence database and each protein was uniquely identified from over 91,000 protein sequences.

Chemistry, Mass Spectrometry and Peptide-Mass Databases: Evolution of Methods for the Rapid Identification and Mapping of Cellular Proteins

The sheer number of proteins is too great to permit large-scale characterization within any useful period of time, and the use of monoclonal antibodies, whilst both rapid and extremely sensitive, requires the ready availability of a large pool of antibody probes.

Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database.

The approach described in this paper provides a convenient method to match the nascent tandem mass spectra of modified peptides to sequences in a protein database and thereby identify previously unknown sites of modification.

Error-tolerant identification of peptides in sequence databases by peptide sequence tags.

A new approach to the identification of mass spectrometrically fragmented peptides is demonstrated and an algorithm developed here that uses the sequence tag to find the peptide in a sequence database is up to 1 million-fold more discriminating than the partial sequence information alone.

Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases.

The correlation of uninterpreted tandem mass spectra of modified and unmodified peptides, produced under low-energy (10-50 eV) collision conditions, with nucleotide sequences is demonstrated and specific sites of modification are identified even though no specific information relevant to Sites of modification is contained in the character-based sequence information of nucleotide databases.

Multiple parameter cross‐species protein identification using MultiIdent ‐ a world‐wide web accessible tool

A new program, MultiIdent, which uses multiple protein parameters such as amino acid composition, peptide masses, sequence tags, estimated protein pI and mass, to achieve cross‐species protein identification and illustrates the power of the approach with the identification of a set of standard proteins, and the Identification of proteins from dog heart separated by two‐dimensional gel electrophoresis.