Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

  title={Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design},
  author={AkshatKumar Nigam and Robert Pollice and G. Tom and Kjell Jorner and Luca Anthony Thiede and Anshul Kundaje and Al{\'a}n Aspuru-Guzik},
: The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in… 

Assessing multi-objective optimization of molecules with genetic algorithms against relevant baselines

It is shown that both CHIMERA and Hypervolume achieve better formal optimality than the baselines and generate molecules closer to a user-specified Utopian point in property space, mimicking typical materials design objectives.

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences

This work proposes a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties and demonstrates its ability to learn non-convex Pare to fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks.



Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

The results show that most “state-of-the-art” methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting.

We should at least be able to Design Molecules that Dock Well

A benchmark based on docking, a popular computational method for assessing molecule binding to a protein, is proposed and it is observed that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set.

GuacaMol: Benchmarking Models for De Novo Molecular Design

This work proposes an evaluation framework, GuacaMol, based on a suite of standardized benchmarks, to standardize the assessment of both classical and neural models for de novo molecular design, and describes a variety of single and multiobjective optimization tasks.

Rigorous Free Energy Simulations in Virtual Screening

It is asserted that alchemical binding free energy methods using all-atom molecular dynamics simulations have matured to the point where they can be applied in virtual screening campaigns as a final scoring stage to prioritize the top molecules for experimental testing.

DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design

Dockstring is presented, a bundle for meaningful and robust comparison of ML models using docking scores, and results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.

Evolutionary chemical space exploration for functional materials: computational organic semiconductor discovery†

An evolutionary method is described which explores a user specified region of chemical space to identify promising molecules, which are subsequently evaluated using crystal structure prediction and reveal two promising structural motifs.

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.

MolecularRNN: Generating realistic molecular graphs with optimized properties

MolecularRNN, the graph recurrent generative model for molecular structures, is presented, which generates diverse realistic molecular graphs after likelihood pretraining on a big database of molecules.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

A benchmarking platform called Molecular Sets (MOSES) is introduced to standardize training and comparison of molecular generative models and suggest to use the results as reference points for further advancements in generative chemistry research.

Automated exploration of the low-energy chemical space with fast quantum chemical methods.

An efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm is proposed and discussed, opening many possible applications in modern computational chemistry and drug discovery.