ChemTS: an efficient python library for de novo molecular generation

@article{Yang2017ChemTSAE,
  title={ChemTS: an efficient python library for de novo molecular generation},
  author={Xiufeng Yang and Jinzhe Zhang and Kazuki Yoshizoe and Kei Terayama and Koji Tsuda},
  journal={Science and Technology of Advanced Materials},
  year={2017},
  volume={18},
  pages={972 - 976}
}
Abstract Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space… 

ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery.

Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.

Structure-Based de Novo Molecular Generator Combined with Artificial Intelligence and Docking Simulations

A new deep learning-based molecular generator, SBMolGen, that integrates a recurrent neural network, a Monte Carlo tree search, and docking simulations that not only generates novel binding active molecules but also presents 3D docking poses with target proteins, which will be useful in subsequent drug design.

De novo generation of optically active small organic molecules using Monte Carlo tree search combined with recurrent neural network

Optically active small organic molecules are computationally designed using the ChemTS python library developed by Tsuda and collaborators, which utilizes a combined Monte Carlo tree search (MCTS)

Guiding Deep Molecular Optimization with Genetic Exploration

This paper proposes genetic expert-guided learning (GEGL), a simple yet novel framework for training a deep neural network (DNN) to generate highly-rewarding molecules and achieves the highest score for 19 tasks, in comparison with state-of-the-art methods.

Constrained Bayesian Optimization for Automatic Chemical Design

It is posited that constrained Bayesian optimization is a good approach for solving this class of training set mismatch in many generative tasks involving Bayesian optimized over the latent space of a variational autoencoder.

Inverse molecular design using machine learning: Generative models for matter engineering

Methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality, are reviewed.

De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen: Part 2.

This work evaluates four additional machine-learning-based de novo methods for generating molecules with high predicted hole mobility for use in semiconductor applications and discovered better performing molecules with the GraphGA method compared to the other approaches.

Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design

A new deep Reinforcement learning-based compounds molecular generation method which integrates transformer network, balanced binary tree search and docking simulation based on super large-scale supercomputing, and shows that more than 96 of the generated molecules are chemically valid.

Molecular generation by Fast Assembly of (Deep)SMILES fragments

A simple method is described to generate only valid molecules at high frequency using a single CPU core, given a molecular training set, and generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching.

In silico generation of novel, drug-like chemical matter using the LSTM neural network

A method to generate molecules using a long short-term memory (LSTM) neural network and an analysis of the results, including a virtual screening test, confirms that the potential of these novel molecules to show bioactivity is comparable to the ChEMBL set from which they were derived.
...

References

SHOWING 1-10 OF 24 REFERENCES

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.

Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration

Bayesian molecular design with a chemical language model

This work addresses the issue of accelerating the material discovery with state-of-the-art machine learning techniques with a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation.

Creating the New from the Old: Combinatorial Libraries Generation with Machine-Learning-Based Compound Structure Optimization

In this study, the starting point for combinatorial library generation was the fingerprint referring to the optimal substructural composition in terms of the activity toward a considered target, which was obtained using a machine learning-based optimization procedure.

MDTS: automatic complex materials design using Monte Carlo tree search

A novel python library called MDTS (Materials Design using Tree Search) is presented, which employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game.

Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions

This method uses historical synthetic knowledge obtained by analyzing information from millions of already synthesized chemicals and considers also molecule complexity, which is sufficiently fast and provides results consistent with estimation of ease of synthesis by experienced medicinal chemists.

ZINC: A Free Tool to Discover Chemistry for Biology

The database contains over twenty million commercially available molecules in biologically relevant representations that may be downloaded in popular ready-to-dock formats and subsets and is freely available at zinc.docking.org.

Grammar Variational Autoencoder

Surprisingly, it is shown that not only does the model more often generate valid outputs, it also learns a more coherent latent space in which nearby points decode to similar discrete outputs.