Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions

@article{Duan2020RetrosynthesisWA,
  title={Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions},
  author={Hongliang Duan and Ling Wang and Chengyun Zhang and Jianjun Li},
  journal={RSC Advances},
  year={2020},
  volume={10},
  pages={1371 - 1378}
}
We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced… 

Leveraging Reaction-aware Substructures for Retrosynthesis and Reaction Prediction

TLDR
A substructure-level decoding model, where the substructures are reaction-aware and can be automatically extracted with a fully data-driven approach, is proposed, which achieves improvement on both two tasks over previously reported models.

Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach

TLDR
This work proposes an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework and demonstrates improved accuracy and reduced grammatically invalid predictions.

RetroTRAE: retrosynthetic translation of atomic environments with Transformer

We present a new single-step retrosynthesis prediction method, viz. RetroTRAE, using fragment-based tokenization and the Transformer architecture. RetroTRAE mimics chemical reasoning, and predicts

RetroTRAE : retrosynthetic prediction with Transformer RetroTRAE : retrosynthetic translation of 1 atomic environments with Transformer

8 We present a new single-step retrosynthesis prediction method, viz. RetroTRAE, using fragment9 based tokenization and the Transformer architecture. RetroTRAE mimics chemical reasoning, and 10

Substructure-based neural machine translation for retrosynthetic prediction

TLDR
This work recast the retrosynthetic planning problem as a language translation problem using a template-free sequence-to-sequence model that predicts highly similar reactant molecules with an accuracy of 57.7% and yields more robust predictions than existing methods.

Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments

Designing efficient synthetic routes for a target molecule remains a major challenge in organic synthesis. Atom environments are ideal, stand-alone, chemically meaningful building blocks providing a

Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

TLDR
This work introduces new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture.

Evaluation Metrics for Single-Step Retrosynthetic Models

TLDR
It is shown that it is possible to train a transformer-based retrosynthetic model, reaching a round-trip accuracy of 82.4%, while covering 96.4% of the reactions.

Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction

TLDR
A novel Graph2SMILES model that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders that mitigates the need for input data augmentation is described.

Forward Reaction Prediction as Reverse Verification: A Novel Approach to Retrosynthesis

TLDR
This work presents a "combined" model approach for retrosynthetic reaction prediction, where the first model is applied to Retrosynthesis, and the second model is applications to the forward reaction prediction to verify the top-n reactants predicted by the retrosyNThetic model.

References

SHOWING 1-10 OF 71 REFERENCES

Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation

TLDR
This work treats reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products, and shows that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset.

Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models

TLDR
A fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule- based expert system component.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

TLDR
This work shows that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set and is able to handle inputs without a reactant–reagent split and including stereochemistry, which makes the method universally applicable.

A Transformer Model for Retrosynthesis

TLDR
A Transformer model for a retrosynthetic reaction prediction task is described and it is found that snapshot learning with averaging weights on learning rates minima works best.

Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions

TLDR
A gated recurrent unit based sequence-to-sequence model and a parser to generate input tokens for model from reaction SMILES strings were built to translate 'reactants and reagents' to 'products'.

Massive Exploration of Neural Machine Translation Architectures

TLDR
This work presents a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters, and reports empirical results and variance numbers for several hundred experimental runs corresponding to over 250,000 GPU hours on a WMT English to German translation task.

Prediction of Organic Reaction Outcomes Using Machine Learning

TLDR
A model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks is reported.

Automatic retrosynthetic route planning using template-free models†

TLDR
This study constructed an automatic data-driven end-to-end retrosynthetic route planning system (AutoSynRoute) using Monte Carlo tree search with a heuristic scoring function and successfully reproduced published synthesis routes for the four case products.

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models† †Electronic supplementary information (ESI) available: Time-split test set and example predictions, together with attention weights, confidence and token probabilities. See DO

TLDR
Using a text-based representation of molecules, chemical reactions are predicted with a neural machine translation model borrowed from language processing to describe how molecules behave in a graph-based model.

Training Tips for the Transformer Model

TLDR
The experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model are described, confirming the general mantra “more data and larger models”.
...