• Corpus ID: 238744253

Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design

  title={Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design},
  author={Wenhao Gao and Roc{\'i}o Mercado and Connor W. Coley},
Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes… 

Path-Aware and Structure-Preserving Generation of Synthetically Accessible Molecules

This work proposes a reaction-embedded and structure-conditioned variational autoencoder to design synthetically accessible molecules that preserve main structural motifs of target molecules and demonstrates that this model can design new molecules with even higher activity than the seed molecules.

MolGenSurvey: A Systematic Survey in Machine Learning Models for Molecule Design

This paper systematically review the most relevant work in machine learning models for molecule design and summarizes all the existing molecule design problems into several venues according to the problem setup, including input, output types and goals.

Retrieval-based Controllable Molecule Generation

This work proposes a new retrieval-based framework for controllable molecule generation using a small set of exemplar molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.

Curiosity in exploring chemical spaces: intrinsic rewards for molecular reinforcement learning

This study examines three versions of intrinsic motivation adapted from intrinsic motivation in the literature that were developed in other settings, predominantly video games to aid efficient exploration of molecular design.

Reinforced Genetic Algorithm for Structure-based Drug Design

Empirical studies on optimizing binding affinity to various disease targets are conducted and it is shown that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations.

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

In the authors' experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.

ChemAlgebra: Algebraic Reasoning on Chemical Reactions

This work proposes CHEMALGEBRA, a benchmark for measuring the reasoning capabilities of deep learning models through the prediction of stoichiometrically-balanced chemical reactions, and believes that it can serve as a useful test bed for the next generation of machine reasoning models and as a promoter of their development.

Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

The results show that most “state-of-the-art” methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting.


This paper argues that most of the existing de novo drug design benchmark functions are either highly unrealistic or depend upon a surrogate model whose performance is not well characterized, and recommends that poor benchmarks (especially logP and QED) be deprecated in favour of better benchmarks.

Defining Levels of Automated Chemical Design

The Automated Chemical Design (ACD) Levels framework provides a common language for describing automated small molecule design systems and enables medicinal chemists to better understand and evaluate such systems.



Machine Learning in Computer-Aided Synthesis Planning.

Two critical aspects of CASP and recent machine learning approaches to both challenges are focused on, including the problem of retrosynthetic planning and anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan.

Planning chemical syntheses with deep neural networks and symbolic AI

This work combines Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps that solve for almost twice as many molecules, thirty times faster than the traditional computer-aided search method.

The Synthesizability of Molecules Proposed by Generative Models

This analysis suggests that to improve the utility of state-of-the-art generative models in real discovery workflows, new algorithm development is warranted.

A Model to Search for Synthesizable Molecules

This work proposes a new molecule generation model that can generate diverse, valid and unique molecules due to the useful inductive biases of modeling reactions, and allows chemists to interrogate not only the properties of the generated molecules but also the feasibility of the synthesis routes.

A robotic platform for flow synthesis of organic compounds informed by AI planning

An approach toward automated, scalable synthesis that combines techniques in artificial intelligence (AI) for planning and robotics for execution is described, representing a milestone on the path toward fully autonomous chemical synthesis.

Barking up the right tree: an approach to search over molecule synthesis DAGs

This work proposes a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs, and argues that this provides sensible inductive biases, ensuring that the model searches over the same chemical space that chemists would also have access to, as well as interpretability.

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

A novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de noVO drug design system.

Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning

This work proposes a novel Reinforcement Learning framework for molecular design in which an agent learns to directly optimize through a space of synthetically accessible drug-like molecules, which outperforms existing state-of-the-art approaches in the optimization of pharmacologically relevant objectives.

Computational planning of the synthesis of complex natural products

Results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization, and a synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

This work shows that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set and is able to handle inputs without a reactant–reagent split and including stereochemistry, which makes the method universally applicable.