Multilingual Molecular Representation Learning via Contrastive Pre-training
@inproceedings{Guo2021MultilingualMR, title={Multilingual Molecular Representation Learning via Contrastive Pre-training}, author={Zhihui Guo and Pramod Kumar Sharma and Andy Martinez and Liang Du and Robin Abraham}, booktitle={Annual Meeting of the Association for Computational Linguistics}, year={2021} }
Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have gained popularity as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single molecular language for representation learning. Motivated by the fact that a given molecule can be described using different languages such as Simplified Molecular Line Entry System (SMILES), The International Union of Pure and…
Figures and Tables from this paper
4 Citations
A Systematic Survey of Molecular Pre-trained Models
- Computer ScienceArXiv
- 2022
A systematic survey of pre-trained models for molecular representations from several key perspectives including molecular descriptors, encoder architectures, pre-training strategies, and applications is provided.
MORN: Molecular Property Prediction Based on Textual-Topological-Spatial Multi-View Learning
- ChemistryCIKM
- 2022
Predicting molecular properties has significant implications for the discovery and generation of drugs and further research in the domain of medicinal chemistry. Learning representations of molecules…
Multilingual and Multimodal Topic Modelling with Pretrained Embeddings
- Computer ScienceCOLING
- 2022
This paper presents M3L-Contrast—a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space and demonstrates that the model is competitive with a zero-shot topic model in predicting topic distributions for comparable multilingual data.
MORN
- Proceedings of the 31st ACM International Conference on Information & Knowledge Management
- 2022
References
SHOWING 1-10 OF 58 REFERENCES
Dual-view Molecule Pre-training
- Computer ScienceArXiv
- 2021
This work proposes to leverage both the representations and design a new pre-training algorithm, dual-view molecule pre- training (briefly, DMP), that can effectively combine the strengths of both types of molecule representations.
MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks
- Computer ScienceArXiv
- 2021
This work presents MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework for large unlabeled molecule datasets and proposes three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal.
MolGPT: Molecular Generation Using a Transformer-Decoder Model
- Computer ScienceJ. Chem. Inf. Model.
- 2022
The model, MolGPT, performs on par with other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique, and novel molecules and it is demonstrated that the model can be trained conditionally to control multiple properties of the generated molecules.
Self-Supervised Graph Transformer on Large-Scale Molecular Data
- Computer ScienceNeurIPS
- 2020
A novel framework, GROVER, which stands for Graph Representation frOm self-supervised mEssage passing tRansformer, which allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above.
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
- Computer ScienceArXiv
- 2020
This work makes one of the first attempts to systematically evaluate transformers on molecular property prediction tasks via the ChemBERTa model, and suggests that transformers offer a promising avenue of future work for molecular representation learning and property prediction.
FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space
- Computer ScienceMolecules
- 2021
Transformers, contrastive learning, and an embedded autoencoder are brought together to create a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing “similar” molecules to cluster together in an effective and interpretable way.
MoleculeNet: A Benchmark for Molecular Machine Learning
- Computer Science
- 2017
MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance, however, this result comes with caveats.
VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder
- Computer SciencebioRxiv
- 2020
The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated.
SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery
- Computer ScienceArXiv
- 2019
Inspired by Transformer and pre-trained language models from natural language processing, SMILES Transformer learns molecular fingerprints through unsupervised pre-training of the sequence-to-sequence language model using a huge corpus of SMilES, a text representation system for molecules.
Translating the Molecules: Adapting Neural Machine Translation to Predict IUPAC Names from a Chemical Identifier
- Computer Science
- 2021
The model uses two stacks of transformers in an encoder-decoder architecture, a setup similar to the neural networks used in state-of-the-art machine translation, and performed particularly well on organics, with the exception of macrocycles.