Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm

  title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm},
  author={Alex Dunn and Qi Wang and Alex M. Ganose and Daniel Dopp and Anubhav Jain},
  journal={npj Computational Materials},
We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13 ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material’s composition… 
A critical examination of compound stability predictions from machine-learned formation energies
It is demonstrated that accurate predictions of formation energy do not imply accurate predictors of stability, emphasizing the importance of assessing model performance on stability predictions, for which this work provides a set of publicly available tests.
Atomistic graph networks for experimental materials property prediction
This work shows how material descriptors can be learned from the structures present in large scale datasets of material simulations; and how they can be used to improve the prediction of an experimental property, the energy of formation of a solid.
Discovery of materials with extreme work functions by high-throughput density functional theory and machine learning
A physics-based approach to featurize surfaces and a supervised machine learning approach to predict the work function of 29,270 surfaces that are created from 2,492 bulk materials, including up to ternary compounds are developed.
AtomSets as a hierarchical transfer learning framework for small and large materials datasets
The AtomSets framework is developed, which utilizes universal compositional and structural descriptors extracted from pre-trained graph network deep learning models with standard multi-layer perceptrons to achieve consistently high model accuracy for both small compositional data (<400) and large structural data (>130,000).
Modeling the dielectric constants of crystals using machine learning.
Analysis of Shapley additive explanations of the ML models reveals that they recover correlations described by textbook Clausius-Mossotti and Penn models, which gives confidence in their ability to describe physical behavior, while providing superior predictive power.
High-throughput search for magnetic and topological order in transition metal oxides
This work performs a high-throughput band topology analysis of centrosymmetric magnetic materials, calculates topological invariants, and identifies 18 new candidate ferromagnetic topological semimetals, axion insulators, and antiferromagneticTopological insulators.
The Role of Machine Learning in the Understanding and Design of Materials
Some of the chief advancements of these methods and their applications in rational materials design are reviewed, followed by a discussion on some of the main challenges and opportunities the authors currently face together with a perspective on the future ofrational materials design and discovery.
An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage
An introduction to the challenges in finding suitable electrocatalysts, how machine learning may be applied to the problem, and the use of the Open Catalyst Project OC20 dataset for model training are provided.
Gapped metals as thermoelectric materials revealed by high-throughput screening
The typical strategy to design high performance thermoelectric materials is to dope a semiconducting material until optimal properties are obtained. However, some known thermoelectric materials such
Distributed representations of atoms and materials for machine learning
An approach for learning distributed representations of atoms, named SkipAtom, which makes use of the growing information in materials structure databases, and is found to be competitive with existing benchmarks that make use of structure.


Benchmark AFLOW Data Sets for Machine Learning
This data descriptor article presents a collection of data sets of different material properties taken from the AFLOW database, describing them, the procedures that generated them, and their use as potential benchmarks.
Machine learning with force-field inspired descriptors for materials: fast screening and mapping energy landscape.
It is demonstrated that the combination of pairwise radial, nearest neighbor, bond-angle, dihedral-angle and core-charge distributions plays an important role in predicting formation energies, bandgaps, static refractive indices, magnetic properties, and modulus of elasticity for three-dimensional materials as well as exfoliation energies of two-dimensional layered materials.
Machine Learning Directed Search for Ultraincompressible, Superhard Materials.
The results show the effectiveness of materials development through state-of-the-art machine-learning techniques by identifying functional inorganic materials.
Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties.
A crystal graph convolutional neural networks framework to directly learn material properties from the connection of atoms in the crystal, providing a universal and interpretable representation of crystalline materials.
A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials
This manuscript has created a framework capable of being applied to a broad range of materials data, and demonstrates how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.
Predicting materials properties without crystal structure: deep representation learning from stoichiometry
A machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data achieves lower errors with less data.
Matminer: An open source toolkit for materials data mining
MoleculeNet: A Benchmark for Molecular Machine Learning
MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance, however, this result comes with caveats.
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
This work develops, for the first time, universal MatErials Graph Network (MEGNet) models for accurate property prediction in both molecules and crystals and demonstrates the transfer learning of elemental embeddings from a property model trained on a larger data set to accelerate the training of property models with smaller amounts of data.