The application of terminomics for the identification of protein start sites and proteoforms in bacteria.
In silico gene prediction has proven to be prone to errors, especially regarding precise localization of start codons that spread in subsequent biological studies. Therefore, the high throughput characterization of protein N-termini is becoming an emerging challenge in the proteomics and especially proteogenomics fields. The trimethoxyphenyl phosphonium (TMPP) labeling approach (N-TOP) is an efficient N-terminomic approach that allows the characterization of both N-terminal and internal peptides in a single experiment. Due to its permanent positive charge, TMPP labeling strongly affects MS/MS fragmentation resulting in unadapted scoring of TMPP-derivatized peptide spectra by classical search engines. This behavior has led to difficulties in validating TMPP-derivatized peptide identifications with usual score filtering and thus to low/underestimated numbers of identified N-termini. We present herein a new strategy (dN-TOP) that overwhelmed the previous limitation allowing a confident and automated N-terminal peptide validation thanks to a combined labeling with light and heavy TMPP reagents. We show how this double labeling allows increasing the number of validated N-terminal peptides. This strategy represents a considerable improvement to the well-established N-TOP method with an enhanced and accelerated data processing making it now fully compatible with high-throughput proteogenomics studies.