Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy

@article{Wallach2017MostLB,
  title={Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy},
  author={Izhar Wallach and Abraham Heifets},
  journal={ArXiv},
  year={2017},
  volume={abs/1706.06619}
}
Undetected overfitting can occur when there are significant redundancies between training and validation data. [...] Key Result Therefore, it is likely that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.Expand
Comparing Fingerprints for Ligand-Based Virtual Screening: A Fast and Scalable Approach for Unbiased Evaluation
TLDR
This work uses a technique that quickly generates splits with AVE distributed close to zero using a combination of clustering and removal of the most biased data to compare the performance of the Morgan and CATS fingerprints.
Quantifying Overfitting Potential in Drug Binding Datasets
TLDR
A recently published metric called the Asymmetric Validation Embedding (AVE) bias is investigated and it is found that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?
TLDR
It seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs, in contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models.
Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
TLDR
This work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target.
Machine learning and ligand binding predictions: A review of data, methods, and obstacles.
TLDR
Current trends in the use of machine learning for drug binding predictions, data sources to develop machine learning algorithms, and potential problems that may lead to overfitting and ungeneralizable models are reviewed.
Machine learning classification can reduce false positives in structure-based virtual screening
TLDR
A new strategy for building a training dataset (D-COID) that aims to generate highly-compelling decoy complexes that are individually matched to available active complexes that support using the D-COIDs for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets.
A semi-supervised learning framework for quantitative structure–activity regression modelling
TLDR
Three methods that solve three problems in QSAR modelling are provided: a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and a method to adjust for screening dependent selection bias inherent in many training datasets.
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening
TLDR
A novel dataset specifically designed for virtual screening and machine learning, consisting in 15 targets, 7844 confirmed active and 407381 confirmed inactive compounds, which mimics experimental screening decks in terms of hit rate (ratio of active to inactive compounds) and potency distribution.
In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
TLDR
There is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS and guidelines for setting up validation experiments are provided and a perspective on how new data sets could be generated are given.
Machine learning classification can reduce false positives in structure-based virtual screening
TLDR
A strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes that help provide chemical probes for new potential drug targets as they are discovered is reported.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
REPROVIS-DB: A Benchmark System for Ligand-Based Virtual Screening Derived from Reproducible Prospective Applications
TLDR
A publicly available compound database of reproducible virtual screens (REPROVIS-DB) that organizes information from successful ligand-based VS applications including reference compounds, screening databases, compound selection criteria, and experimentally confirmed hits is designed.
An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs
TLDR
The method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD), and addressed an important issue about the ratio of decoys per ligand.
Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery
TLDR
By analyzing the molecular similarities of known drugs, it is shown that the inductive bias of the historic drug discovery process has a very strong 2D bias.
Bias, reporting, and sharing: computational evaluations of docking methods
  • Ajay N. Jain
  • Computer Science, Medicine
    J. Comput. Aided Mol. Des.
  • 2008
TLDR
This paper presents detailed examples of pitfalls in each area of data sharing, data set design and preparation, and statistical reporting and makes recommendations as to best practices.
Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds.
TLDR
The improved "pQSAR 2.0" method replaces probabilities of activity from naïve Bayes categorical models at several thresholds with predicted IC50s from RFR models, allowing for predicted potency, ligand efficiency, lipophilic efficiency, and selectivity against antitargets, greatly facilitating hitlist triaging and enabling virtual screening panels such as toxicity panels and overall promiscuity predictions.
Virtual Screening Using Protein—Ligand Docking: Avoiding Artificial Enrichment.
This study addresses a number of topical issues around the use of protein−ligand docking in virtual screening. We show that, for the validation of such methods, it is key to use focused libraries
Does your model weigh the same as a Duck?
TLDR
These fallacies of logic will be discussed in the context of off-target predictive modeling, QSAR, molecular similarity computations, and docking and examples will be shown that avoid these problems.
Comparability of Mixed IC50 Data – A Statistical Analysis
TLDR
The types of errors, the redundancy and the variability that can be found in ChEMBL IC50 database, which is used to guide lead optimization, build large-scale chemogenomics analysis, off-target activity and toxicity models based on public data, were analyzed.
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking
TLDR
An improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC, is described.
Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data
TLDR
Refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data that provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods.
...
1
2
3
4
5
...