Corpus ID: 235458110

Do Large Scale Molecular Language Representations Capture Important Structural Information?

  title={Do Large Scale Molecular Language Representations Capture Important Structural Information?},
  author={Jerret Ross and Brian M. Belgodere and Vijil Chenthamarakshan and Inkit Padhi and Youssef Mroueh and Payel Das},
Predicting chemical properties from the structure of a molecule is of great importance in many applications including drug discovery and material design. Machine learning based molecular property prediction holds the promise of enabling accurate predictions at much less complexity, when compared to, for example Density Functional Theory (DFT) calculations. Features extracted from molecular graphs, using graph neural nets in a supervised manner, have emerged as strong baselines for such tasks… Expand

Figures and Tables from this paper


The Message Passing Neural Networks for Chemical Property Prediction on SMILES.
This is the first attempt to learn chemical strings using an MP-based algorithm and the results are comparable to previous state-of-the-art and baseline models or outperform. Expand
CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations
The proposed CheMixNet models not only outperforms the candidate neural architectures such as contemporary fully connected networks that uses molecular fingerprints and 1-D CNN and RNN models trained SMILES sequences, but also other state-of-the-art architecture such as Chemception and Molecular Graph Convolutions. Expand
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction.
A convolutional neural network is employed for the embedding task of learning an expressive molecular representation by treating molecules as undirected graphs with attributed nodes and edges, and preserves molecule-level spatial information that significantly enhances model performance. Expand
N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules
The N-gram graph is introduced, a simple unsupervised representation for molecules that is equivalent to a simple graph neural network that needs no training and is complemented by theoretical analysis showing its strong representation and prediction power. Expand
SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties
SMILES2vec is developed, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering, and demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry. Expand
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state of theart features for long-range contact prediction. Expand
Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism.
A new graph neural network architecture called Attentive FP for molecular representation that uses a graph attention mechanism to learn from relevant drug discovery datasets and achieves state-of-the-art predictive performances on a variety of datasets and that what it learns is interpretable. Expand
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
This work makes one of the first attempts to systematically evaluate transformers on molecular property prediction tasks via the ChemBERTa model, and suggests that transformers offer a promising avenue of future work for molecular representation learning and property prediction. Expand
ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning
For the per-residue predictions the transfer of the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without using evolutionary information thereby bypassing expensive database searches. Expand
MoleculeNet: A Benchmark for Molecular Machine Learning
MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance, however, this result comes with caveats. Expand