BARTSmiles: Generative Masked Language Models for Molecular Representations

  title={BARTSmiles: Generative Masked Language Models for Molecular Representations},
  author={Gayane Chilingaryan and Hovhannes Tamoyan and Ani Tevosyan and Nelly Babayan and Lusine Khondkaryan and Karen Hambardzumyan and Z. Navoyan and Hrant Khachatrian and Armen Aghajanyan},
We discover a robust self-supervised strategy tailored towards molecular representations for generative masked language models through a series of tailored, in-depth ablations. Using this pre-training strategy, we train BARTSmiles, a BART-like model with an order of magnitude more compute than previous self-supervised molecular representations. In-depth evaluations show that BARTSmiles consistently outperforms other self-supervised representations across classification, regression, and… 

Molecular Language Model as Multi-task Generator

This work proposes M OL G EN, a pre-trained molecular language model that effectively learns and shares knowledge across multiple generation tasks and domains, and proposes multi-task molecular prefix tuning across several moleculargeneration tasks and different molecular domains with a self-feedback mechanism.

Modeling Scattering Coefficients using Self-Attentive Complex Polynomials with Image-based Representation

This work proposes a sample-efficient and accurate surrogate model, named CZP (Constant Zeros Poles), to directly estimate the scattering coef ficients in the frequency domain of a given 2D planar antenna design, without using a simulator.

Scaling Laws for Generative Mixed-Modal Language Models

New mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them are reported, and the optimal synergy and competition due to data and model size is explicitly model as an additive term to previous uni-modAL scaling laws.



Do Large Scale Molecular Language Representations Capture Important Structural Information?

Experiments show that the learned molecular representation, MOLFORMER, performs competitively, when compared to existing graph-based and fingerprint-based supervised learning baselines, on the challenging tasks of predicting properties of QM8 and QM9 molecules.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

This work makes one of the first attempts to systematically evaluate transformers on molecular property prediction tasks via the ChemBERTa model, and suggests that transformers offer a promising avenue of future work for molecular representation learning and property prediction.

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

This paper empirically shows that common pre-trained models have a very low intrinsic dimension, and connects intrinsic dimensionality with low dimensional task representations and compression based generalization bounds to provide intrinsic-dimension-based generalizations bounds that are independent of the full parameter count.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

Self-Supervised Graph Transformer on Large-Scale Molecular Data

A novel framework, GROVER, which stands for Graph Representation frOm self-supervised mEssage passing tRansformer, which allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above.

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.

Analyzing Learned Molecular Representations for Property Prediction

A graph convolutional model is introduced that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets.

Pre-training via Paraphrasing

It is shown that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

A large-scale evaluation of modeling choices and their impact on zero-shot generalization of large pretrained Transformer language models focuses on text-to-text models and shows that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero- shot generalization after purely self-supervised pretraining.