Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design

@article{Griffiths2021DatasetBI,
  title={Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design},
  author={Ryan-Rhys Griffiths and P. Schwaller},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.02637}
}
Datasets in the Natural Sciences are often curated with the goal of aiding scientific understanding and hence may not always be in a form that facilitates the application of machine learning. In this paper, we identify three trends within the fields of chemical reaction prediction and synthesis design that require a change in direction. First, the manner in which reaction datasets are split into reactants and reagents encourages testing models in an unrealistically generous manner. Second, we… Expand

Figures from this paper

Gaussian Process Molecule Property Prediction with FlowMO
Heteroscedastic Bayesian Optimisation in Scientific Discovery
High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

References

SHOWING 1-10 OF 45 REFERENCES
A Machine Learning Approach to Predict Chemical Reactions
Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network
What's What: The (Nearly) Definitive Guide to Reaction Role Assignment
Predicting Electron Paths
Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models
Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter.
Prototype-Based Compound Discovery Using Deep Generative Models.
Optimization of Molecules via Deep Reinforcement Learning
...
1
2
3
4
5
...