NOMAD 2018 Kaggle Competition: Solving Materials Science Challenges Through Crowd Sourcing
@article{Sutton2018NOMAD2K, title={NOMAD 2018 Kaggle Competition: Solving Materials Science Challenges Through Crowd Sourcing}, author={Christopher Sutton and Luca M. Ghiringhelli and Takenori Yamamoto and Yury Lysogorskiy and Lars Blumenthal and Thomas Hammerschmidt and Jacek R. Golebiowski and Xiangyue Liu and Angelo Ziletti and Matthias Scheffler}, journal={arXiv: Materials Science}, year={2018} }
Machine learning (ML) is increasingly used in the field of materials science, where statistical estimates of computed properties are employed to rapidly examine the chemical space for new compounds. However, a systematic comparison of several ML models for this domain has been hindered by the scarcity of appropriate datasets of materials properties, as well as the lack of thorough benchmarking studies. To address this, a public data-analytics competition was organized by the Novel Materials…
Figures and Tables from this paper
14 Citations
Identifying domains of applicability of machine learning models for materials science
- Computer ScienceNature communications
- 2020
A diagnostic tool to detect regions of low expected model error as demonstrated for the case of transparent conducting oxides is introduced and it is found that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.
Progress in Computational and Machine‐Learning Methods for Heterogeneous Small‐Molecule Activation
- ChemistryAdvanced materials
- 2020
The theory and methodologies for heterogeneous catalysis and their applications for small-molecule activation are reviewed and promising directions of the computational catalysis field for further outlooks are discussed, focusing on the challenges and opportunities for new methods.
The role of feature space in atomistic learning
- Computer ScienceMach. Learn. Sci. Technol.
- 2021
This work introduces a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels, in terms of the structure of the feature space that they induce, and defines diagnostic tools to determine whether alternative feature spaces contain equivalent amounts of information.
Big Data-Driven Materials Science and Its FAIR Data Infrastructure
- Computer ScienceHandbook of Materials Modeling
- 2019
This chapter addresses the forth paradigm of materials research -- big-data driven materials science. Its concepts and state-of-the-art are described, and its challenges and chances are discussed.…
Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems
- Computer ScienceMachine Learning with Applications
- 2022
The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems.
- Computer Science
- 2019
The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems is introduced, a feature engineering method based on maximizing the transfer of information from the physical structure to the feature space, able to describe complex systems, as well as molecules, and crystals.
Parametrically constrained geometry relaxations for high-throughput materials science
- Computer Sciencenpj Computational Materials
- 2019
A flexible and generalizable parametric relaxation scheme is introduced and implemented in the all-electron code FHI-aims and it is shown how these constraints can reduce the number of steps needed to relax local lattice distortions by an order of magnitude.
Study of Different Deep Learning Approach with Explainable AI for Screening Patients with COVID-19 Symptoms: Using CT Scan and Chest X-ray Image Dataset
- Computer ScienceArXiv
- 2020
A deep learning-based model is developed that can detect COVID-19 patients with better accuracy both on CT scan and chest X-ray image dataset and test results demonstrate that it is conceivable to interpret top features that should have worked to build a trust AI framework to distinguish between patients with CO VID-19 symptoms with other patients.
Licensing in Artificial Intelligence Competitions and Consortium Project Collaborations
- Computer Science2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)
- 2020
The results indicate that each form of collaboration has its own set of rules that address comparable concerns but have different content, and practitioners can utilise the results to implement licensing for IP exchange that fits the desired type of collaboration.
The Plant Pathology Challenge 2020 data set to classify foliar disease of apples
- Computer ScienceApplications in plant sciences
- 2020
This data set will contribute toward development and deployment of machine learning–based automated plant disease classification algorithms to ultimately realize fast and accurate disease detection.
References
SHOWING 1-10 OF 86 REFERENCES
Insightful classification of crystal structures using deep learning
- Computer Science, Materials ScienceNature Communications
- 2018
This study uses machine learning to automatically classify more than 100,000 simulated perfect and defective crystal structures, paving the way for crystal structure recognition of—possibly noisy and incomplete—three-dimensional structural data in big-data materials science.
High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds
- Materials Science
- 2016
A machine-learning model has been trained to discover Heusler compounds, which are intermetallics exhibiting diverse physical properties attractive for applications in thermoelectric and spintronic…
New tolerance factor to predict the stability of perovskite oxides and halides
- Materials ScienceScience Advances
- 2019
An accurate, physically interpretable, and one-dimensional tolerance factor, τ, is developed that correctly predicts 92% of compounds as perovskite or nonperovskites for an experimental dataset of 576 ABX3 materials using a novel data analytics approach based on SISSO.
Cluster expansion made easy with Bayesian compressive sensing
- Computer Science
- 2013
The use of BCS is demonstrated to build clusterexpansion models for several binary alloy systems, showing the speed of the method and the accuracy of the results are far superior than state-of-the-art evolutionary methods for all alloy systems shown.
Combinatorial screening for new materials in unconstrained composition space with machine learning
- Computer Science
- 2014
A machine learning model is constructed from a database of thousands of density functional theory calculations that can predict the thermodynamic stability of arbitrary compositions without any other input and with six orders of magnitude less computer time than DFT.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
- Computer ScienceNIPS
- 2017
It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.
Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning
- Materials Science
- 2017
We perform a large scale benchmark of machine learning methods for the prediction of the thermodynamic stability of solids. We start by constructing a data set that comprises density functional…
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space
- ChemistryThe journal of physical chemistry letters
- 2015
A systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules and is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space.
Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.
- ChemistryJournal of chemical theory and computation
- 2013
A number of established machine learning techniques are outlined and the influence of the molecular representation on the methods performance is investigated, finding the best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules.
SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates
- Computer SciencePhysical Review Materials
- 2018
The sure independence screening and sparsifying operator (SISSO) tackles immense and correlated features spaces, and converges to the optimal solution from a combination of features relevant to the materials' property of interest.