• Corpus ID: 119280903

NOMAD 2018 Kaggle Competition: Solving Materials Science Challenges Through Crowd Sourcing

  title={NOMAD 2018 Kaggle Competition: Solving Materials Science Challenges Through Crowd Sourcing},
  author={Christopher Sutton and Luca M. Ghiringhelli and Takenori Yamamoto and Yury Lysogorskiy and Lars Blumenthal and Thomas Hammerschmidt and Jacek R. Golebiowski and Xiangyue Liu and Angelo Ziletti and Matthias Scheffler},
  journal={arXiv: Materials Science},
Machine learning (ML) is increasingly used in the field of materials science, where statistical estimates of computed properties are employed to rapidly examine the chemical space for new compounds. However, a systematic comparison of several ML models for this domain has been hindered by the scarcity of appropriate datasets of materials properties, as well as the lack of thorough benchmarking studies. To address this, a public data-analytics competition was organized by the Novel Materials… 

Figures and Tables from this paper

Identifying domains of applicability of machine learning models for materials science

A diagnostic tool to detect regions of low expected model error as demonstrated for the case of transparent conducting oxides is introduced and it is found that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.

Progress in Computational and Machine‐Learning Methods for Heterogeneous Small‐Molecule Activation

The theory and methodologies for heterogeneous catalysis and their applications for small-molecule activation are reviewed and promising directions of the computational catalysis field for further outlooks are discussed, focusing on the challenges and opportunities for new methods.

The role of feature space in atomistic learning

This work introduces a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels, in terms of the structure of the feature space that they induce, and defines diagnostic tools to determine whether alternative feature spaces contain equivalent amounts of information.

Big Data-Driven Materials Science and Its FAIR Data Infrastructure

This chapter addresses the forth paradigm of materials research -- big-data driven materials science. Its concepts and state-of-the-art are described, and its challenges and chances are discussed.

Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems

The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems.

The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems is introduced, a feature engineering method based on maximizing the transfer of information from the physical structure to the feature space, able to describe complex systems, as well as molecules, and crystals.

Parametrically constrained geometry relaxations for high-throughput materials science

A flexible and generalizable parametric relaxation scheme is introduced and implemented in the all-electron code FHI-aims and it is shown how these constraints can reduce the number of steps needed to relax local lattice distortions by an order of magnitude.

Study of Different Deep Learning Approach with Explainable AI for Screening Patients with COVID-19 Symptoms: Using CT Scan and Chest X-ray Image Dataset

A deep learning-based model is developed that can detect COVID-19 patients with better accuracy both on CT scan and chest X-ray image dataset and test results demonstrate that it is conceivable to interpret top features that should have worked to build a trust AI framework to distinguish between patients with CO VID-19 symptoms with other patients.

Licensing in Artificial Intelligence Competitions and Consortium Project Collaborations

The results indicate that each form of collaboration has its own set of rules that address comparable concerns but have different content, and practitioners can utilise the results to implement licensing for IP exchange that fits the desired type of collaboration.

The Plant Pathology Challenge 2020 data set to classify foliar disease of apples

This data set will contribute toward development and deployment of machine learning–based automated plant disease classification algorithms to ultimately realize fast and accurate disease detection.



Insightful classification of crystal structures using deep learning

This study uses machine learning to automatically classify more than 100,000 simulated perfect and defective crystal structures, paving the way for crystal structure recognition of—possibly noisy and incomplete—three-dimensional structural data in big-data materials science.

High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds

A machine-learning model has been trained to discover Heusler compounds, which are intermetallics exhibiting diverse physical properties attractive for applications in thermoelectric and spintronic

New tolerance factor to predict the stability of perovskite oxides and halides

An accurate, physically interpretable, and one-dimensional tolerance factor, τ, is developed that correctly predicts 92% of compounds as perovskite or nonperovskites for an experimental dataset of 576 ABX3 materials using a novel data analytics approach based on SISSO.

Cluster expansion made easy with Bayesian compressive sensing

The use of BCS is demonstrated to build clusterexpansion models for several binary alloy systems, showing the speed of the method and the accuracy of the results are far superior than state-of-the-art evolutionary methods for all alloy systems shown.

Combinatorial screening for new materials in unconstrained composition space with machine learning

A machine learning model is constructed from a database of thousands of density functional theory calculations that can predict the thermodynamic stability of arbitrary compositions without any other input and with six orders of magnitude less computer time than DFT.

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.

Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning

We perform a large scale benchmark of machine learning methods for the prediction of the thermodynamic stability of solids. We start by constructing a data set that comprises density functional

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

A systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules and is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space.

Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

A number of established machine learning techniques are outlined and the influence of the molecular representation on the methods performance is investigated, finding the best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules.

SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates

The sure independence screening and sparsifying operator (SISSO) tackles immense and correlated features spaces, and converges to the optimal solution from a combination of features relevant to the materials' property of interest.