SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules

  title={SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules},
  author={Esben Jannik Bjerrum},
Simplified Molecular Input Line Entry System (SMILES) is a single line text representation of a unique molecule. One molecule can however have multiple SMILES strings, which is a reason that canonical SMILES have been defined, which ensures a one to one correspondence between SMILES string and molecule. Here the fact that multiple SMILES represent the same molecule is explored as a technique for data augmentation of a molecular QSAR dataset modeled by a long short term memory (LSTM) cell based… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 20 times over the past 90 days. VIEW TWEETS
16 Citations
24 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 16 extracted citations


Publications referenced by this paper.
Showing 1-10 of 24 references

A Python framework for fast computation of mathematical expressions arXiv e-prints 2016

  • S. R. Subramanyam, J. Sygnowski, +14 authors Y. Theano Zhang
  • 2016

GPyOpt: A Bayesian Optimization framework in Python 2016

  • T G.
  • 2016

Low Data Drug Discovery with Oneshot Learning arXiv preprint arXiv:1611.03199

  • H. Altae-Tran, B. Ramsundar, A. S. Pappu, V. Pande
  • 2016

RDKit: Open-Source Cheminformatics Software

  • G. A. Landrum
  • 2016

Similar Papers

Loading similar papers…