Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

  title={Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization},
  author={Wenhao Gao and Tianfan Fu and Jimeng Sun and Connor W. Coley},
Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the… 

Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

This work develops a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions, and demonstrates the utility and ease of use of the new benchmark set.

Graph Neural Networks for Molecules

This review introduces GNNs and their various applications for small organic molecules and summarizes the recent development of self-supervised learning for molecules withGNNs.

Controllable Data Generation by Deep Learning: A Review

This article provides a systematic review of this promising research area, commonly known as controllable deep data generation, and formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized.



ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations

ChemBO, a Bayesian optimization framework for generating and optimizing organic molecules for desired molecular properties, is described and a novel optimal-transport based distance and kernel that accounts for graphical information explicitly is proposed.

Accelerating high-throughput virtual screening through molecular pool-based active learning†

Model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

Differentiable Scaffolding Tree for Molecular Optimization

DST enables a gradient-based optimization on a chemical graph structure by back-propagating the derivatives from the target properties through a graph neural network (GNN) and can also provide an explanation that helps domain experts understand the model output.

The Synthesizability of Molecules Proposed by Generative Models

This analysis suggests that to improve the utility of state-of-the-art generative models in real discovery workflows, new algorithm development is warranted.

MARS: Markov Molecular Sampling for Multi-objective Drug Discovery

Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered.

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.1% relative improvement over the best baseline in terms of success rate.

Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning

This work proposes a novel Reinforcement Learning framework for molecular design in which an agent learns to directly optimize through a space of synthetically accessible drug-like molecules, which outperforms existing state-of-the-art approaches in the optimization of pharmacologically relevant objectives.

Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

A novel RL framework that generates pharmacochemically acceptable molecules with large docking scores and produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets in terms of the docking scores of the generated molecules.

Optimization of Molecules via Deep Reinforcement Learning

Inspired by problems faced during medicinal chemistry lead optimization, the MolDQN model is extended with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule.

Quantifying the chemical beauty of drugs.

The utility of QED is extended by applying it to the problem of molecular target druggability assessment by prioritizing a large set of published bioactive compounds and may also capture the abstract notion of aesthetics in medicinal chemistry.