Best Practices for QSAR Model Development, Validation, and Exploitation

  title={Best Practices for QSAR Model Development, Validation, and Exploitation},
  author={Alexander Tropsha},
  journal={Molecular Informatics},
  • A. Tropsha
  • Published 12 July 2010
  • Biology, Chemistry
  • Molecular Informatics
After nearly five decades “in the making”, QSAR modeling has established itself as one of the major computational molecular modeling methodologies. As any mature research discipline, QSAR modeling can be characterized by a collection of well defined protocols and procedures that enable the expert application of the method for exploring and exploiting ever growing collections of biologically active chemical compounds. This review examines most critical QSAR modeling routines that we regard as… 

An automated framework for QSAR model building

An extendable and highly customizable fully automated QSAR modeling framework that is capable of building reliable models even for challenging problems and does not require any advanced parameterization nor depends on users decisions or expertise in machine learning/programming.

Generalized Workflow for Generating Highly Predictive in Silico Off-Target Activity Models

The use of data from bioactivity databases for the generation of high quality in silico models for off-target mediated toxicity as a decision support in early drug discovery and crop-protection research is evaluated.

Recent trends in statistical QSAR modeling of environmental chemical toxicity.

  • A. Tropsha
  • Biology, Chemistry
    Experientia supplementum
  • 2012
In this chapter, recent trends emphasizing the need for both careful curation of experimental data prior to model development and rigorous model validation are investigated and recent approaches to chemical toxicity prediction that employ both chemical descriptors and in vitro screening data for developing novel hybrid chemical/biological models are being reviewed.

QSARINS: A new software for the development, analysis, and validation of QSAR MLR models

The Insubria Persistent Bioaccumulative and Toxic (PBT) Index model for the prediction of the cumulative behavior of new chemicals as PBTs is implemented and the user can validate single models, predeveloped using also different software.

QSAR modeling: where have you been? Where are you going to?

Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and

QSPR Modeling For Critical Temperatures Of Organic Compounds Using Hybrid Optimal Descriptors

The simplified molecular input line entry system (SMILES) is particularly suitable for high-speed machine processing, based on the Monte Carlo method using CORAL software. Quantitative

QPHAR: quantitative pharmacophore activity relationship: method and validation

Low requirements for dataset sizes render quantitative pharmacophores a viable go-tomethod for medicinal chemists, especially in the lead-optimisation stage of drug discovery projects.

Strategies for the generation, validation and application of in silico ADMET models in lead generation and optimization

The generation of ADMET models and their practical use in decision making are discussed, including the issues surrounding data collation, experimental errors, the model assessment and validation steps, as well as the different types of descriptors and statistical models that can be used.

QSAR without borders.

This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed inQSAR to a wide range of research areas outside of traditional QSar boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.



Predictive QSAR modeling workflow, model applicability domains, and virtual screening.

This critical review re-examines the strategy and the output of the modern QSAR modeling approaches and provides examples and arguments suggesting that current methodologies may afford robust and validated models capable of accurate prediction of compound properties for molecules not included in the training sets.

Rational selection of training and test sets for the development of validated QSAR models

There is additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and it is argued that this observation is a general property of any QSAR model developed with LOO cross-validation.

The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models

A set of simple guidelines for developing validated and predictive QSPR models is presented, highlighting the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and some algorithms that can be used for this purpose.

Principles of QSAR models validation: internal and external

Evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection

It is demonstrated that QSAR models built and validated with the approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets.

Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds.

A drug discovery strategy that employs variable selection quantitative structure-activity relationship (QSAR) models for chemical database mining for the discovery of anticonvulsant agents in the Maybridge and National Cancer Institute databases containing ca.

Validation of a QSAR model for acute toxicity

The aim being to demonstrate how statistical validation and domain definition are both required to establish model validity and to provide reliable predictions of toxicity to the fathead minnow, to be useful for the regulatory assessment of chemicals.

Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection

It is shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented and the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors.

Application of validated QSAR models of D1 dopaminergic antagonists for database mining.

This study illustrates that the combined application of predictive QSAR modeling and database mining may provide an important avenue for rational computer-aided drug discovery.

Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates.

The development, validation, and application of quantitative structure-property relationship (QSPR) models of metabolic turnover rate for compounds in human S9 homogenate spells a rapid, computational screen for generating components of the ADME profile in a drug discovery process.