• Corpus ID: 232404470

Symbolic regression outperforms other models for small data sets

@article{Wilstrup2021SymbolicRO,
  title={Symbolic regression outperforms other models for small data sets},
  author={Casper Wilstrup and Jaan Kasak},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.15147}
}
Machine learning is often applied to obtain predictions and new understanding of complex phenomena and relationships, but availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques such as random forests and gradient boosting tend to overfit when working with data sets of a few hundred samples. This study demonstrates that for small training sets of 250 observations, symbolic regression is a superior alternative to these machine learning… 

Figures and Tables from this paper

What can we Learn by Predicting Accuracy?

TLDR
A symbolic regression method is used to automatically find a mathematical expression highly correlated with a linear classifier’s accuracy, and this formula is highly explainable and confirms insights from various previous papers on loss design.

Symbolic regression analysis of interactions between first trimester maternal serum adipokines in pregnancies which develop pre-eclampsia

TLDR
Symbolic regression identified non-linear interactions between Lp, sLR and Re concentrations in first trimester pregnancy serum of pregnancies which later developed PE, which suggest new pathophysiological pathways and may help in designing more efficient screening protocols for PE.

SymFormer: End-to-end symbolic regression using transformer-based architecture

TLDR
This work proposes a transformer-based approach called SymFormer, which predicts the formula by outputting the individual symbols and the corresponding constants simultaneously simultaneously, which leads to better performance in terms of available data.

A Symbolic Regression Approach to Hepatocellular Carcinoma Diagnosis Using Hypermethylated CpG Islands in Circulating Cell-Free DNA

TLDR
A genetic programming-based symbolic regression approach was applied to gain the benefits of machine learning while avoiding the opacity drawbacks of "black box" models and developed an equation utilizing the methylation levels of three biomarkers, with an accuracy of 91.3%, a sensitivity of 100%, and a specificity of 87.5%.

Identifying interactions in omics data for clinical biomarker discovery using symbolic regression

TLDR
A novel symbolic-regression-based algorithm, the QLattice, is presented, which generates parsimonious high-performing models that can both predict disease outcomes and reveal putative disease mechanisms.

An Explainable Lattice based Fertility Treatment Outcome Prediction Model for TeleFertility

TLDR
This work deploys Machine Learning and Explainable Artificial Intelligence to predict the outcomes of fertility treatment using interpretable Machine Learning Lattice Models for predictive, preventive and precision reproductive medicine.

References

SHOWING 1-10 OF 32 REFERENCES

Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery

TLDR
This article uses a neural network-based architecture for symbolic regression called the equation learner (EQL) network and integrates it with other deep learning architectures such that the whole system can be trained end-to-end through backpropagation.

Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming

TLDR
Results of the experiments suggest that alternating the order of nonlinearity of GP individuals with their structural complexity produces solutions that are both compact and have smoother response surfaces, and, hence, contributes to better interpretability and understanding.

A Unified Approach to Interpreting Model Predictions

TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Random Forests

TLDR
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

Discovering Symbolic Models from Deep Learning with Inductive Biases

TLDR
The correct known equations, including force laws and Hamiltonians, can be extracted from the neural network and a new analytic formula is discovered which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures.

AI Feynman: A physics-inspired method for symbolic regression

TLDR
This work develops a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques and improves the state-of-the-art success rate.

Empirical modeling using symbolic regression via postfix Genetic Programming

  • V. DabhiS. Vij
  • Computer Science
    2011 International Conference on Image Information Processing
  • 2011
TLDR
The suitability of Neural Network and symbolic regression via Genetic Programming (GP) to solve empirical modeling problems is explored and it is concluded that symbolic regression through GP can deal efficiently with these problems.

Toward an artificial intelligence physicist for unsupervised learning.

TLDR
This work proposes a paradigm centered around the learning and manipulation of theories, which parsimoniously predict both aspects of the future and the domain in which these predictions are accurate, and proposes a generalized mean loss to encourage each theory to specialize in its comparatively advantageous domain.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Learning Symbolic Physics with Graph Networks

TLDR
An approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization is introduced and offers a valuable technique for interpreting and inferring explicit causal theories about the world from implicit knowledge captured by deep learning.