Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes

  title={Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes},
  author={Yun Zhang and Ling Wang and Xinqiao Wang and Chengyun Zhang and Jiamin Ge and Jing Tang and A. Su and H. Duan},
  journal={Organic chemistry frontiers},
  • Yun Zhang, Ling Wang, +5 authors H. Duan
  • Published 2021
  • Chemistry
  • Organic chemistry frontiers
Effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery. Despite the outstanding capability of deep learning in retrosynthesis and forward synthesis, predictions based on small chemical datasets generally result in a low accuracy due to an insufficiency of reaction examples. Here, we introduce a new state-of-the-art method, which integrates transfer learning with the transformer model to predict… Expand
1 Citations
Target Prediction Model for Natural Products Using Transfer Learning
The target prediction model using transfer learning can be applied in the field of natural product-based drug discovery and has the potential to find more lead compounds or to assist researchers in drug repurposing. Expand


Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level
This study proves that transferring learning between models working with different chemical datasets is feasible and significantly improved prediction accuracy and, especially, assisted in small dataset based reaction prediction and retrosynthetic analysis. Expand
Heck reaction prediction using a transformer model based on a transfer learning strategy.
A proof-of-concept methodology for addressing small amounts of chemical data using transfer learning by applying transfer learning combined with the transformer model to small-dataset Heck reaction prediction is presented. Expand
Transfer Learning for Drug Discovery.
  • C. Cai, Shiwei Wang, +5 authors Jianfeng Pei
  • Chemistry, Medicine
  • Journal of medicinal chemistry
  • 2020
This perspective aims to provide an overview of transferLearning and related applications in drug discovery and give outlooks as to future development and application of transfer learning for drug discovery. Expand
Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction
This work shows that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set and is able to handle inputs without a reactant–reagent split and including stereochemistry, which makes the method universally applicable. Expand
Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction.
It is reported that deep neural networks can learn to resolve reactivity conflicts and to prioritize the most suitable transformation rules. Expand
Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions
This work casts retrosynthesis as a machine translation problem by introducing a special Tensor2Tensor, an entire attention-based and fully data-driven model that significantly outperforms seq2seq model on a top-1 accuracy. Expand
Generative molecular design in low data regimes
A deep learning framework for customized compound library generation is presented that aims to enrich and expand the pharmacologically relevant chemical space with drug-like molecular entities on demand to generate molecules that incorporate features of both bioactive synthetic compounds and natural products. Expand
Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning
Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets. Expand
Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models
A fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule- based expert system component. Expand
Prediction of Organic Reaction Outcomes Using Machine Learning
A model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks is reported. Expand