• Corpus ID: 31322594

Synthesis Insights from Scienti fi c Literature via Text Extraction and Machine Learning

  title={Synthesis Insights from Scienti fi c Literature via Text Extraction and Machine Learning},
  author={Edward Kim and Kevin Huang and Adam Saunders and Andrew McCallum and Gerbrand Ceder and Elsa A. Olivetti},
In the past several years, Materials Genome Initiative (MGI) efforts have produced myriad examples of computationally designed materials in the fields of energy storage, catalysis, thermoelectrics, and hydrogen storage as well as large data resources that are used to screen for potentially transformative compounds. The bottleneck in high-throughput materials design has thus shifted to materials synthesis, which motivates our development of a methodology to automatically compile materials… 

Figures from this paper


Machine-learned and codified synthesis parameters of oxide materials
A collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques is presented.
Machine-learning-assisted materials discovery using failed experiments
This work demonstrates an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites, and successfully predicted conditions for new organically Templated inorganic product formation.
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
This system provides an extensible, chemistry-aware, natural language processing pipeline for tokenization, part-of-speech tagging, named entity recognition, and phrase parsing, and the novel use of multiple rule-based grammars that are tailored for interpreting specific document domains such as textual paragraphs, captions, and tables.
Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights
The Bootstrapped Projected Gradient Descent algorithm has significant advantages over commonly used machine learning and statistical analysis tools such as the regression coefficient shrinkage-based method (LASSO) or artificial neural networks: (a) it selects descriptors with greater stability and transferability, with a goal to understand the chemical mechanism.
The Materials Super Highway: Integrating High-Throughput Experimentation into Mapping the Catalysis Materials Genome
The materials genome initiative (MGI) aims to accelerate the process of materials discovery and reduce the time to commercialization of advanced materials. Thus far, the MGI has resulted in
ChemicalTagger: A tool for semantic text-mining in chemistry
The ChemicalTagger parser is developed as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments and it is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser.
ChemSpot: a hybrid system for chemical named entity recognition
ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities is presented.
Combinatorial screening for new materials in unconstrained composition space with machine learning
A machine learning model is constructed from a database of thousands of density functional theory calculations that can predict the thermodynamic stability of arbitrary compositions without any other input and with six orders of magnitude less computer time than DFT.
Machine Learning Strategy for Accelerated Design of Polymer Dielectrics
This work addresses the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace.