Review and research on feature selection methods from NMR data in biological fluids. Presentation of an original ensemble method applied to atherosclerosis field.

Abstract

Metabolic pools of biological matrices can be extensively analyzed by NMR. Measured data consist of hundreds of NMR signals with different chemical shifts and intensities representing different metabolites' types and levels, respectively. Relevant predictive NMR signals need to be extracted from the pool using variable selection methods. This paper presents both a review and research on this metabolomics field. After reviews on discriminant potentials and statistical analyses of NMR data in biological fields, the paper presents an original approach to extract a small number of NMR signals in a biological matrix A (BM-A) in order to predict metabolic levels in another biological matrix B (BM-B). Initially, NMR dataset of BM-A was decomposed into several row-column homogeneous blocks using hierarchical cluster analysis (HCA). Then, each block was subjected to a complete set of Jackknifed correspondence analysis (CA) by removing separately each individual (row). Each CA condensed the numerous NMR signals into some principal components (PCs). The different PCs representing the (n - 1) active individuals were used as latent variables in a stepwise multi-linear regression to predict metabolic levels in BM-B. From the built regression model, metabolite level in the outside individual was predicted (for next model validation). >From all the PCs-based regression models resulting from all the jackknifed CA applied on all the individuals, the most contributive NMR signals were identified by their highest absolute contributions to PCs. Finally, these selected NMR signals (measured in BMA) were used to build final population and sub-population regression models predicting metabolite levels in BM-B.

Cite this paper

@article{Semmar2014ReviewAR, title={Review and research on feature selection methods from NMR data in biological fluids. Presentation of an original ensemble method applied to atherosclerosis field.}, author={Nabil Semmar and C{\'e}cile Canlet and Bernadette Delplanque and Pascale Le Ruyet and Alain Paris and Jean-Charles Martin}, journal={Current drug metabolism}, year={2014}, volume={15 5}, pages={544-56} }