Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure-retention relationship studies.

Abstract

The use of the classification and regression tree (CART) methodology was studied in a quantitative structure-retention relationship (QSRR) context on a data set consisting of the retentions of 83 structurally diverse drugs on a Unisphere PBD column, using isocratic elutions at pH 11.7. The response (dependent variable) in the tree models consisted of the predicted rention factor (log kw) of the solutes, while a set of 266 molecular descriptors was used as explanatory variables in the tree building. Molecular descriptors related to the hydrophobicity (log P and Hy) and the size (TPC) of the molecules were selected out of these 266 descriptors in order to describe and predict retention. Besides the above mentioned, CART was also able to select hydrogen-bonding and molecular complexity descriptors. Since these variables are expected from QSRR knowledge, it demonstrates the potential of CART as a methodology to understand retention in chromatographic systems. The potential of CART to predict retention and thus occasionally to select an appropriate system for a given mixture was also evaluated. Reasonably good prediction, i.e. only 9% serious misclassification, was observed. Moreover, some of the misclassifications probably are inherent to the data set applied.

Cite this paper

@article{Put2003ClassificationAR, title={Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure-retention relationship studies.}, author={R V D Put and Catherine Perrin and Frederik Questier and Danny Coomans and Desire L. Massart and Yvan Vander Heyden}, journal={Journal of chromatography. A}, year={2003}, volume={988 2}, pages={261-76} }