Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases

  title={Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases},
  author={Xenofon E. Floros and George M. Spyrou and Konstantinos N. Vougas and George Th. Tsangaris and Konstantina S. Nikita},
Mass spectrometry is becoming an important tool in biological sciences. Tissue samples or easily obtained biological fluids (serum, plasma, urine) are analysed by a variety of mass spectrometry methods, producing spectra characterized by very high dimensionality and a high level of noise. Here we address a feature exraction method for mass spectra which consists of two main steps : In the first step an algorithm for low level preprocessing of mass spectra is applied, including denoising with… 
2 Citations

Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles.

A system for automatically determining a set of preprocessing methods among several candidates is developed, which relieves the analyst of the need to be knowledgeable about which methods to use on any given dataset.



Protocols for disease classification from mass spectrometry data

The results in classifying protein matrix‐assisted laser desorption/ionization‐time of flight mass spectra obtained from serum samples into diseased and healthy groups are reported, finding closely matching masses in a database for protein expression in lung cancer for three of the four proteins used to classify lung cancer.

High-resolution serum proteomic features for ovarian cancer detection.

It is concluded that the use of high-resolution MS yields superior classification patterns as compared with those obtained with lower resolution instrumentation, and multiple and distinct proteomic patterns, comprising low molecular weight biomarkers, detected by high- resolution MS achieve accuracies surpassing individual biomarker classifiers, warranting validation in a large clinical study.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features.

Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform

A novel algorithm to process the spectra, denoising with the undecimated discrete wavelet transform (UDWT), and evaluated it for consistency and reproducibility, providing improvements over existing methods.

Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process

The analysis of variance provides insight into the relevance of various factors influencing the outcome of the pairwise peak-list comparison, providing a strong indication that the results presented here might be valid for many various types of peptide mass measurements.

Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method

Wavelet transformation and latent variable projection method are particularly useful for spectroscopic and chromatographic data and the simple two‐component PLS‐DA model obtained from the analysis performed well.

Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data

Summary.  We present a method of data reduction using a wavelet transform in discriminant analysis when the number of variables is much greater than the number of observations. The method is

Using AUC and accuracy in evaluating learning algorithms

  • Jin HuangC. Ling
  • Computer Science
    IEEE Transactions on Knowledge and Data Engineering
  • 2005
It is shown theoretically and empirically that AUC is a better measure (defined precisely) than accuracy and reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results.

Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing

A statistical learning formulation of the signal background estimation problem that can be solved using an Expectation-Maximization type algorithm is proposed and an alternative method for estimating the background level of a signal in small size windows based on a recursive histogram computation is presented.


This paper describes exact and explicit representations of the differential operators, d/dx, n = 1, 2, · · ·, in orthonormal bases of compactly supported wavelets as well as the representations of