SELFIES and the future of molecular string representations

  title={SELFIES and the future of molecular string representations},
  author={Mario Krenn and Qianxiang Ai and Senja Barthel and Nessa Carson and Angelo Frei and Nathan C Frey and Pascal Friederich and Th{\'e}ophile Gaudin and Alberto Gayle and Kevin Maik Jablonka and R. Lameiro and Dominik Lemm and Alston Lo and Seyed Mohamad Moosavi and Jos'e Manuel N'apoles-Duarte and AkshatKumar Nigam and Robert Pollice and Kohulan Rajan and Ulrich Schatzschneider and Philippe Schwaller and Marta Skreta and Berend Smit and Felix Strieth‐Kalthoff and Chong Sun and G. Tom and Guido Falk von Rudorff and Andrew Wang and Andrew D. White and Adamo Young and Rose Yu and Al{\'a}n Aspuru‐Guzik},

Graph neural networks for materials science and chemistry

An overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures are provided, followed by a discussion of a wide range of recent applications, and a possible road-map for their further development is indicated.

Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

This work develops a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions, and demonstrates the utility and ease of use of the new benchmark set.

xtal2png: A Python package for representing crystal structure as PNG files

The ability to feed these images directly into image-based pipelines allows you, as a materials informatics practitioner, to get streamlined results for new state-of-the-art image- based machine learning models applied to crystal structures.

ParticleGrid: Enabling Deep Learning using 3D Representation of Materials

ParticleGrid is proposed, a SIMD-optimized library for 3D structures that is designed for deep learning applications and to seamlessly integrate with deep learning frameworks and shows theacy of 3D grids generated via ParticleGrid and accurately predict molecular energy properties using a 3D convolutional neural network.



A review of molecular representation in the age of machine learning

Questions for consideration are presented in future work which are believed to make chemical VAEs even more accessible, including string, connection table, feature‐based, and computer‐learned representations.

Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation

SELFIES (SELF-referencIng Embedded Strings), a string-based representation of molecules which is 100% robust and allows for explanation and interpretation of the internal working of the generative models.

Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations

PASITHEA is proposed, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision that forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property.

Development of Multimodal Machine Learning Potentials: Toward a Physics-Aware Artificial Intelligence.

This Account focuses on the out-of-the-box approaches to developing transferable MLIPs for diverse chemical tasks, and introduces the "Accurate Neural Network engine for Molecular Energies," ANAKIN-ME, method (or ANI for short), which combines ML and the extended Hückel method.

STOUT: SMILES to IUPAC names using neural machine translation

This work presents STOUT, a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMilES string from the IupAC name.

Machine Learning Force Fields

An overview of applications of ML-FFs and the chemical insights that can be obtained from them is given, and a step-by-step guide for constructing and testing them from scratch is given.

Applications of Deep Learning in Molecule Generation and Molecular Property Prediction.

This Account will focus on two key areas where deep learning has impacted molecular design: the prediction of molecular properties and the de novo generation of suggestions for new molecules.

Randomized SMILES strings improve the quality of molecular generative models

An extensive benchmark on models trained with subsets of GDB-13 of different sizes, with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations shows that models that use LSTM cells trained with 1 million randomized SMilES are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space.

Discovery of novel chemical reactions by deep generative recurrent neural network

It is shown that “creative” AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class.

DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures

A SMilES-like syntax called DeepSMILES is described that addresses two of the main reasons for invalid syntax when using a probabilistic model to generate SMILES strings and can be interconverted to/from SMILes with string processing without any loss of information.