Machine‐learning scoring functions for structure‐based virtual screening

@article{Li2020MachinelearningSF,
  title={Machine‐learning scoring functions for structure‐based virtual screening},
  author={Hongjian Li and Kam-Heung Sze and Gang Lu and Pedro J. Ballester},
  journal={Wiley Interdisciplinary Reviews: Computational Molecular Science},
  year={2020},
  volume={11}
}
Molecular docking predicts whether and how small molecules bind to a macromolecular target using a suitable 3D structure. Scoring functions for structure‐based virtual screening primarily aim at discovering which molecules bind to the considered target when these form part of a library with a much higher proportion of non‐binders. Classical scoring functions are essentially models building a linear mapping between the features describing a protein–ligand complex and its binding label. Machine… 
Selecting machine-learning scoring functions for structure-based virtual screening.
  • P. Ballester
  • Computer Science, Medicine
    Drug discovery today. Technologies
  • 2019
TLDR
Two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target are analyzed.
Computational representations of protein–ligand interfaces for structure-based virtual screening
TLDR
The authors review the computational methods for representing protein-ligand interfaces, which include the traditional ones that use deliberately designed fingerprints and descriptors and the more recent methods that automatically extract features with deep learning.
Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning
TLDR
It is claimed that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS, and results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking.
DOCKSTRING: easy molecular docking yields better benchmarks for ligand design
TLDR
Dockstring is presented, a bundle for meaningful and robust comparison of ML models consisting of an open-source Python package for straightforward computation of docking scores, an extensive dataset of dockingScore, the first to include docking poses, and the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning.
The impact of compound library size on the performance of scoring functions for structure-based virtual screening
TLDR
It is found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF, and a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs.
The impact of compound library size on the performance of scoring functions for structure-based virtual screening.
TLDR
It is found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF, and a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs.
Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark
TLDR
It is shown for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks.
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?
TLDR
It seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs, in contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models.
The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
TLDR
This study developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets.
Prediction of Binding Free Energy of Protein–Ligand Complexes with a Hybrid Molecular Mechanics/Generalized Born Surface Area and Machine Learning Method
Accurate prediction of protein–ligand binding free energies is important in enzyme engineering and drug discovery. The molecular mechanics/generalized Born surface area (MM/GBSA) approach is widely
...
1
2
3
...

References

SHOWING 1-10 OF 145 REFERENCES
Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening
TLDR
The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert‐selected structural features can be strongly improved by a machine‐learning approach based on nonlinear regression allied with comprehensive data‐driven feature selection.
Machine‐learning scoring functions for structure‐based drug lead optimization
TLDR
The performance gap between classical and machine-learning SFs for drug lead optimization in the 2015–2019 period was large and has now broadened owing to methodological improvements and the availability of more training data.
Performance of machine-learning scoring functions in structure-based virtual screening
TLDR
A new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets that provides much better prediction of measured binding affinity than Vina.
Comparative assessment of machine-learning scoring functions on PDBbind 2013
TLDR
This paper proposes 12 scoring functions based on a wide range of ML techniques and analyzes the performance of each on the scoring power, ranking power, docking power, and screening power using the PDBbind 2013 database.
A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking
TLDR
A novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning is proposed and Random Forest was used to implicitly capture binding effects that are hard to model explicitly.
Protein-Ligand Empirical Interaction Components for Virtual Screening.
TLDR
The trained PLEIC-SVM model is able to capture important interaction patterns between ligand and protein residues for one specific target, which is helpful in discarding false positives in postdocking filtering.
Beware of Machine Learning-Based Scoring Functions - On the Danger of Developing Black Boxes
TLDR
It is shown that the Surflex-Dock scoring function is logically sensitive to the quality of docking poses, and it is proposed that two additional benchmarking tests must be systematically done when developing novel scoring functions to avoid developing novel but meaningless scoring functions.
Improving scoring‐docking‐screening powers of protein–ligand scoring functions using random forest
TLDR
To improve scoring‐docking‐screening powers of protein–ligand docking functions simultaneously, a ΔvinaRF parameterization and feature selection framework based on random forest is introduced, which can achieve superior performance in all power tests of both CASF‐2013 and CASf‐2007 benchmarks compared to classical scoring functions.
An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking
TLDR
This study discusses the foundations of the four types of scoring functions, suitable application areas and shortcomings, but also discusses challenges and potential future study directions.
Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment
TLDR
BT-Score, an ensemble machine-learning SF of boosted decision trees and thousands of predictive descriptors to estimate BA, and BT-Dock, a boosted-tree ensemble model trained on a large number of native and computer-generated ligand conformations and optimized to predict binding poses explicitly are proposed.
...
1
2
3
4
5
...