Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet

  title={Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet},
  author={Pierre-Paul De Breuck and Matthew L. Evans and Gian-Marco Rignanese},
  journal={Journal of Physics: Condensed Matter},
As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model performance. In this paper, we benchmark the Materials Optimal Descriptor Network (MODNet) method and architecture against the recently released MatBench v0.1, a curated test suite of materials datasets. MODNet is shown to outperform current leaders on 6 of the 13 tasks, while closely matching the current leaders on… 
Predicting Solid State Material Platforms for Quantum Technologies
A framework for the automated discovery of semiconductor host platforms for QT using material informatics and machine learning methods, resulting in a dataset consisting of over 25.000 materials and nearly 5000 physics-informed features is developed.
A Universal Graph Deep Learning Interatomic Potential for the Periodic Table
Interatomic potentials (IAPs), which describe the potential energy surface of a collection of atoms, are a fundamental input for atomistic simulations. However, existing IAPs are either fitted to
Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings
Machine learning for materials discovery has largely focused on predicting an individual scalar rather than multiple related properties, where spectral properties are an important example.
Reflections on the future of machine learning for materials research
Naohiro Fujinuma, 2 Brian DeCost, Jason Hattrick-Simpers, and Samuel E. Lofland Department of Chemical Engineering, Rowan University, Glassboro, NJ, USA Sekisui Chemical Co., Ltd, 2-4-4 Nishitemma,


A review of feature selection methods with applications
  • A. Jović, K. Brkic, N. Bogunovic
  • Computer Science
    2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)
  • 2015
This review considers most of the commonly used FS techniques, including standard filter, wrapper, and embedded methods, and provides insight into FS for recent hybrid approaches and other advanced topics.
Physical Review Materials 3 044602 Robust model benchmarking and bias-imbalance in data-driven materials science
  • 2019
Learning properties of ordered and disordered materials from multi-fidelity data
Predicting the properties of a material from the arrangement of its atoms is a fundamental goal in materials science. While machine learning has emerged in recent years as a new paradigm to provide
Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm
It is shown that crystal graph methods appear to outperform traditional machine learning methods given ~10 4 or greater data points, and is encouraged to encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.
MODNet -- accurate and interpretable property predictions for limited materials datasets by feature selection and joint-learning
An all-round framework is presented which relies on a feedforward neural network, the selection of physically-meaningful features and, when applicable, joint-learning, and is shown to outperform current graph-network models on small datasets.
Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction
This paper introduces a set of quantitative criteria to capture different uncertainty aspects, and uses these criteria to compare MC-Dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public datasets.
Predicting materials properties without crystal structure: deep representation learning from stoichiometry
A machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data achieves lower errors with less data.