Dereplication strategies in natural product research: How many tools and methodologies behind the same concept?
In the present work, support vector machines (SVMs) and multiple linear regression (MLR) techniques were used for quantitative structure-property relationship (QSPR) studies of retention time (t(R)) in standardized liquid chromatography-UV-mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins) based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLR and SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD). The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r(2) and q(2) are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William's plot. The effects of different descriptors on the retention times are described.