• Corpus ID: 235421604

Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening

@article{Clyde2021ProteinLigandDS,
  title={Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening},
  author={Austin R. Clyde and Thomas S. Brettin and Alexander Partin and Hyun Seung Yoo and Y. Babuji and Ben Blaiszik and Andr{\'e} Merzky and Matteo Turilli and Shantenu Jha and Arvind Ramanathan and Rick L. Stevens},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.07036}
}
We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standard docking protocols on the same supercomputer node types. We demonstrate the power of high-speed… 

Figures and Tables from this paper

AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection

There is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits.

Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph Neural Networks

This work introduces Deep Surrogate Docking (DSD), a framework that applies deep learning-based surrogate modeling to accelerate the docking process substantially, and shows that graph neural networks (GNNs) can serve as fast and accurate estimators of classical docking algorithms.

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

Development and deployment of computational infrastructure at scale integrates multiple artificial intelligence and simulation-based approaches to overcome this fundamental limitation of the drug discovery process.

HIGH-THROUGHPUT VIRTUAL SCREENING PIPELINES

This study proposes two optimization frameworks, applying to most (if not all) screening campaigns involving experimental or/and computational evaluations, for optimally determining the screening thresholds of an HTS pipeline, and validate the proposed frameworks on both analytic and practical scenarios.

RAPTOR: Ravenous Throughput Computing

  • A. MerzkyM. TurilliS. Jha
  • Computer Science, Biology
    2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
  • 2022
RAPTOR represents important progress towards improvement of computational drug discovery, in terms of size of libraries screened, and for the possibility of generating training data fast enough to serve the last generation of docking surrogate models.

Optimal Decision Making in High-Throughput Virtual Screening Pipelines

The proposed optimal HTVS framework can accelerate screening virtually without any degradation in terms of accuracy and enables an adaptive operational strategy for HTVs, where one can trade accuracy for efficiency.

References

SHOWING 1-10 OF 15 REFERENCES

Protein-Ligand Scoring with Convolutional Neural Networks

This work describes convolutional neural network scoring functions that take as input a comprehensive three-dimensional representation of a protein-ligand interaction and finds that the CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

COVID Moonshot: Open Science Discovery of SARS-CoV-2 Main Protease Inhibitors by Combining Crowdsourcing, High-Throughput Experiments, Computational Simulations, and Machine Learning

This manuscript describes the methodologies leading to both covalent and non-covalent inhibitors displaying protease IC50 values under 150 nM and viral inhibition under 5 uM in multiple different viral replication assays.

Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power.

Overall, the ligand binding poses could be identified in most cases by the evaluated docking programs but the ranks of the binding affinities for the entire dataset could not be well predicted by most docking programs.

High Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Non-Covalent Inhibitor

A novel non-covalent small-molecule inhibitor that binds to and inhibits the SARS-Cov-2 main protease (Mpro) is discovered by employing a scalable high throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased.

COVID-19 Docking Server: A meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19

CO VID-19 Docking Server is introduced, a web server that predicts the binding modes between COVID-19 targets and the ligands including small molecules, peptides and antibodies, and the meta platform provides a free and interactive tool for the prediction of COvid-19 target-ligand interactions and following drug discovery for COID-19.

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

This data release encompasses structural information on the 4.2 B molecules enriched with pre-computed data to enable exploration and application of image-based deep learning methods, and 2D and 3D molecular descriptors to speed development of machine learning models.

Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking

An improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC, is described.

Regression Enrichment Surfaces: a Simple Analysis Technique for Virtual Drug Screening Models

This work presents a new method for understanding the performance of a model in virtual drug screening tasks, regression enrichment surfaces (RES), based on the goal of virtual screening: to detect as many of the top-performing treatments as possible.

Anti-COVID-19 terpenoid from marine sources: A docking, admet and molecular dynamics study

ZINC 15 – Ligand Discovery for Everyone

A suite of ligand annotation, purchasability, target, and biology association tools, incorporated into ZINC and meant for investigators who are not computer specialists, offer new analysis tools that are easy for nonspecialists yet with few limitations for experts.