Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges

  title={Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges},
  author={Duc Duy Nguyen and Zixuan Cang and Kedi Wu and Menglun Wang and Yin Cao and Guowei Wei},
  journal={Journal of Computer-Aided Molecular Design},
Advanced mathematics, such as multiscale weighted colored subgraph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R Grand Challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 focused on the pose prediction, binding affinity ranking and free energy prediction for Farnesoid X receptor ligands… 
Predicting binding poses and affinity ranking in D3R Grand Challenge using PL-PatchSurfer2.0
An overview of the method, which uses the three-dimensional Zernike descriptor, a mathematical moment-based shape descriptor, to quantify local shape complementarity between a ligand and a receptor, which properly incorporates molecular flexibility and provides stable affinity assessment for a bound ligand–receptor complex.
D3R grand challenge 4: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies
The results of Grand Challenge 4, which focused on proteins beta secretase 1 and Cathepsin S, and was run in an analogous manner to prior challenges, are reported on.
Combining Docking Pose Rank and Structure with Deep Learning Improves Protein-Ligand Binding Mode Prediction over a Baseline Docking Approach
A deep learning model for binding mode prediction that uses docking ranking as input in combination with docking structures is developed and outperforms a baseline docking program in a variety of tests, including on cross-docking datasets that mimic real-world docking use cases.
Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S
The importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning is demonstrated, and the best structure-based ranking protocol can achieve Kendall’s τ of 0.52 for all binders in GC4.
MathDL: mathematical deep learning for D3R Grand Challenge 4
The authors' MathDL models achieved the top place in pose prediction for BACE ligands in Stage 1a and obtained the highest Spearman correlation coefficient on the affinity ranking of 460 CatS compounds, and the smallest centered root mean square error on the free energy set of 39 CatS molecules.
Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction
PerSpect-based machine learning models can significantly improve prediction accuracy for protein-ligand binding affinity prediction and are better than all existing models, as far as the authors know.
Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions
The computational approaches to predicting protein–ligand interactions in the context of drug discovery are reviewed, focusing on methods using artificial intelligence (AI) including both classical ML algorithms and recent deep learning methods.
GGL-Tox: Geometric Graph Learning for Toxicity Prediction
This work develops a geometric graph learning toxicity (GGL-Tox) model by integrating MWCG features and the gradient boosting decision tree (GBDT) algorithm, inspired by the success of multiscale weighted colored graph (MWCG) theory in protein-ligand binding affinity predictions.
Machine‐learning scoring functions for structure‐based drug lead optimization
The performance gap between classical and machine-learning SFs for drug lead optimization in the 2015–2019 period was large and has now broadened owing to methodological improvements and the availability of more training data.
Target-Specific Prediction of Ligand Affinity with Structure-Based Interaction Fingerprints
A large number of available inhibitor-bound HIV-1 protease structures are evaluated to evaluate inhibitor diversity and machine learning models to predict ligand affinity and it is found that a gradient boosting machine learning model with this explicit feature attribution can predict binding affinity with high accuracy.


Optimal strategies for virtual screening of induced-fit and flexible target in the 2015 D3R Grand Challenge
These studies suggest that analysis of changes on the receptor structure upon ligand binding can help select an optimal virtual screening strategy.
TopP–S: Persistent homology‐based multi‐task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility
This work introduces an algebraic topology‐based method, called element‐specific persistent homology (ESPH), as a new representation of small molecules that is entirely different from conventional chemical and/or physical representations, and demonstrates that the proposed approaches achieve some of the most accurate predictions of aqueous solubility and partition coefficient.
Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study
This study investigates the conditions of applying RF under various contexts and finds that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities.
Are 2D fingerprints still valuable for drug discovery?
It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3D structure- based models for the predictions of toxicity, solubility, partition coefficient and protein-ligand binding affinity based on only ligand information.
Further development and validation of empirical scoring functions for structure-based binding affinity prediction
The results show that this consensus scoring function, X-CSCORE, improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.
Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks
This work introduces element specific persistent homology (ESPH), an algebraic topology approach, for quantitative toxicity prediction, and a topology based multitask strategy to take the advantage of the availability of large data sets while dealing with small data sets.
Integration of element specific persistent homology and machine learning for protein‐ligand binding affinity prediction
  • Zixuan Cang, G. Wei
  • Biology
    International journal for numerical methods in biomedical engineering
  • 2018
The present approach reveals that protein‐ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design.
DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction.
A necessary prerequisite to successfully resolving the scoring problem with a more discriminative scoring function is the generation of highly accurate ligand poses, which approximate the native pose to below 1 angstroms rmsd, in a docking run.
A review of mathematical representations of biomolecular data.
This review focuses the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation.
TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions
A multi-task multichannel topological convolutional neural network (MM-TCNN) that outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein foldingfree energy changes, and mutation induced membrane protein folding free energy changes.