Best Practices for QSAR Model Development, Validation, and Exploitation

  title={Best Practices for QSAR Model Development, Validation, and Exploitation},
  author={Alexander Tropsha},
  journal={Molecular Informatics},
  • A. Tropsha
  • Published 12 July 2010
  • Biology, Chemistry
  • Molecular Informatics
After nearly five decades “in the making”, QSAR modeling has established itself as one of the major computational molecular modeling methodologies. As any mature research discipline, QSAR modeling can be characterized by a collection of well defined protocols and procedures that enable the expert application of the method for exploring and exploiting ever growing collections of biologically active chemical compounds. This review examines most critical QSAR modeling routines that we regard as… 

An automated framework for QSAR model building

An extendable and highly customizable fully automated QSAR modeling framework that is capable of building reliable models even for challenging problems and does not require any advanced parameterization nor depends on users decisions or expertise in machine learning/programming.

QSARINS: A new software for the development, analysis, and validation of QSAR MLR models

The Insubria Persistent Bioaccumulative and Toxic (PBT) Index model for the prediction of the cumulative behavior of new chemicals as PBTs is implemented and the user can validate single models, predeveloped using also different software.

QSAR modeling: where have you been? Where are you going to?

Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and

QSPR Modeling For Critical Temperatures Of Organic Compounds Using Hybrid Optimal Descriptors

The simplified molecular input line entry system (SMILES) is particularly suitable for high-speed machine processing, based on the Monte Carlo method using CORAL software. Quantitative

Strategies for the generation, validation and application of in silico ADMET models in lead generation and optimization

The generation of ADMET models and their practical use in decision making are discussed, including the issues surrounding data collation, experimental errors, the model assessment and validation steps, as well as the different types of descriptors and statistical models that can be used.

QSAR without borders.

This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed inQSAR to a wide range of research areas outside of traditional QSar boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.

OPERA models for predicting physicochemical properties and environmental fate endpoints

This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes and uses data from the publicly available PHYSPROP database, a set of 13 common physicochemical and environmental fate properties.

Principles of QSAR Modeling

At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years’ experience, mainly on Multiple Linear Regression using a Genetic Algorithm for variable selection from various theoretical molecular descriptors.



Predictive QSAR modeling workflow, model applicability domains, and virtual screening.

This critical review re-examines the strategy and the output of the modern QSAR modeling approaches and provides examples and arguments suggesting that current methodologies may afford robust and validated models capable of accurate prediction of compound properties for molecules not included in the training sets.

The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models

A set of simple guidelines for developing validated and predictive QSPR models is presented, highlighting the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and some algorithms that can be used for this purpose.

Principles of QSAR models validation: internal and external

Evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection

It is demonstrated that QSAR models built and validated with the approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets.

Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds.

A drug discovery strategy that employs variable selection quantitative structure-activity relationship (QSAR) models for chemical database mining for the discovery of anticonvulsant agents in the Maybridge and National Cancer Institute databases containing ca.

Validation of a QSAR model for acute toxicity

The aim being to demonstrate how statistical validation and domain definition are both required to establish model validity and to provide reliable predictions of toxicity to the fathead minnow, to be useful for the regulatory assessment of chemicals.

Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis

An international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology develops 15 different types of QSAR models of aquatic toxicity and finds that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models.

Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection

It is shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented and the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors.

Application of validated QSAR models of D1 dopaminergic antagonists for database mining.

This study illustrates that the combined application of predictive QSAR modeling and database mining may provide an important avenue for rational computer-aided drug discovery.

Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates.

The development, validation, and application of quantitative structure-property relationship (QSPR) models of metabolic turnover rate for compounds in human S9 homogenate spells a rapid, computational screen for generating components of the ADME profile in a drug discovery process.