Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

  title={Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models},
  author={Daniil Polykovskiy and Alexander Zhebrak and Benjam{\'i}n S{\'a}nchez-Lengeling and Sergey Golovanov and Oktai Tatanov and Stanislav Belyaev and Rauf Kurbanov and Aleksey Anatolievich Artamonov and Vladimir Aladinskiy and Mark Veselov and Artur Kadurin and Sergey I. Nikolenko and Al{\'a}n Aspuru-Guzik and Alex Zhavoronkov},
  journal={Frontiers in Pharmacology},
Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to… 

Figures and Tables from this paper

Molecular Generators and Optimizers Failure Modes

    Mani Manavalan
    Computer Science
    Malaysian Journal of Medical and Biological Research
  • 2021
The purpose of this work is to show how current shortcomings in evaluating generative models for molecules can be avoided, and to suggest that distribution-learning can attain near-perfect scores on many existing criteria even with the most basic and completely useless models.

Comparative Study of Deep Generative Models on Chemical Space Coverage (v18)

This study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated.

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study

This work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target.

MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction

It is shown that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike.

Giving Attention to Generative VAE Models for De Novo Molecular Design

It is found that both RNNAttn and TransVAE models perform substantially better when tasked with accurately reconstructing input SMILES strings than the MosesVAE or RNN models, particularly for larger molecules up to ~700 Da.

Comparative Study of Deep Generative Models on Chemical Space Coverage

This work presents a novel and complementary metric for evaluating deep molecular generative models based on the chemical space coverage of a reference dataset-GDB-13, and provides a useful new metric that can be used for evaluating and comparingGenerative models.

Score-Based Generative Models for Molecule Generation

This work lays the foundations by testing the efficacy of score-based models for molecule generation by training a Transformer-based score function on Self-Referencing Embedded Strings representations of 1.5 million samples from the ZINC dataset and using the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.

MolGPT: Molecular Generation Using a Transformer-Decoder Model

The model, MolGPT, performs on par with other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique, and novel molecules and it is demonstrated that the model can be trained conditionally to control multiple properties of the generated molecules.

Lingo3DMol: Generation of a Pocket-based 3D Molecule using a Language Model

A pocket-based 3D molecule generation method that leverages the language model with the ability to generate 3D coordinates, achieving state-of-the-art performance in nearly all metrics, notably in terms of binding patterns, drug-like properties, rational conformations, and inference speed.

ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery.

Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.

GuacaMol: Benchmarking Models for De Novo Molecular Design

This work proposes an evaluation framework, GuacaMol, based on a suite of standardized benchmarks, to standardize the assessment of both classical and neural models for de novo molecular design, and describes a variety of single and multiobjective optimization tasks.

MoleculeNet: A Benchmark for Molecular Machine Learning

MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance, however, this result comes with caveats.

Randomized SMILES strings improve the quality of molecular generative models

An extensive benchmark on models trained with subsets of GDB-13 of different sizes, with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations shows that models that use LSTM cells trained with 1 million randomized SMilES are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space.

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.

Molecular de-novo design through deep reinforcement learning

A method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties is introduced.

MolGAN: An implicit generative model for small molecular graphs

MolGAN is introduced, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuris-tics of previous likelihood-based methods.

Application of Generative Autoencoder in De Novo Molecular Design

The results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures in autoencoder for de novo molecular design.

ChemTS: an efficient python library for de novo molecular generation

A novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN is presented, which showed superior efficiency in finding high-scoring molecules in a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability.

druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.

This work developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of adjustability in generating molecular fingerprints; capacity of processing very large molecular data sets; and efficiency in unsupervised pretraining for regression model.

Conditional molecular design with deep generative models

A conditional molecular design method that facilitates generating new molecules with desired properties is presented, built as a semisupervised variational autoencoder trained on a set of existing molecules with only a partial annotation.