Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
The rate constant for hydroxyl radical tropospheric degradation of 460 heterogeneous organic compounds is predicted by QSAR modeling. The applied Multiple Linear Regression is based on a variety of theoretical molecular descriptors, selected by the Genetic Algorithms-Variable Subset Selection (GA-VSS) procedure. The models were validated for predictivity by both internal and external validation. For the external validation two splitting approaches, D-optimal Experimental Design and Kohonen Artificial Neural Networks (K-ANN), were applied to the original data set to compare the two methodologies. We emphasize that external validation is the only way to establish a reliable QSAR model for predictive purposes. Predicted data by consensus modeling from different models are also proposed.