Natural language processing models that automate programming will transform chemistry research and teaching

  title={Natural language processing models that automate programming will transform chemistry research and teaching},
  author={Glen M. Hocky and A. White},
  journal={Digital Discovery},
  pages={79 - 83}
Natural language processing models have emerged that can generate useable software and automate a number of programming tasks with high fidelity. These tools have yet to have an impact on the chemistry community. Yet, our initial testing demonstrates that this form of artificial intelligence is poised to transform chemistry and chemical engineering research. Here, we review developments that brought us to this point, examine applications in chemistry, and give our perspective on how this may… 

Figures from this paper



Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems

A critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design are reviewed.

Voice-controlled quantum chemistry

ChemVox is an interactive Amazon Alexa skill that uses speech recognition to perform quantum chemistry calculations and interfaces Alexa with cloud computing and returns the results through a capable device.

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

It is suggested that the function of few-shot examples in these cases is better described as locating an already learned task rather than meta-learning, which motivates rethinking the role of prompts in controlling and evaluating powerful language models.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Best practices in machine learning for chemistry

The elements necessary to train reliable, repeatable and reproducible models are discussed, and a set of guidelines for machine learning reports are recommended.

CHEMDNER: The drugs and chemical names extraction challenge

This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data, and expected that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications.

Generating Text with Recurrent Neural Networks

The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced.

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, and carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values are provided.

Will robots kill chemistry?

I’m just barely a midcareer professional, and tumult is one word that I can use to describe the economic shifts within chemistry that I’ve witnessed. In my short time in the industry, I’ve seen