# Symbolic regression outperforms other models for small data sets

@article{Wilstrup2021SymbolicRO, title={Symbolic regression outperforms other models for small data sets}, author={Casper Wilstrup and Jaan Kasak}, journal={ArXiv}, year={2021}, volume={abs/2103.15147} }

Machine learning is often applied to obtain predictions and new understanding of complex phenomena and relationships, but availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques such as random forests and gradient boosting tend to overfit when working with data sets of a few hundred samples. This study demonstrates that for small training sets of 250 observations, symbolic regression is a superior alternative to these machine learning…

## 6 Citations

### What can we Learn by Predicting Accuracy?

- Computer ScienceArXiv
- 2022

A symbolic regression method is used to automatically find a mathematical expression highly correlated with a linear classifier’s accuracy, and this formula is highly explainable and confirms insights from various previous papers on loss design.

### Symbolic regression analysis of interactions between first trimester maternal serum adipokines in pregnancies which develop pre-eclampsia

- MedicinemedRxiv
- 2022

Symbolic regression identified non-linear interactions between Lp, sLR and Re concentrations in first trimester pregnancy serum of pregnancies which later developed PE, which suggest new pathophysiological pathways and may help in designing more efficient screening protocols for PE.

### SymFormer: End-to-end symbolic regression using transformer-based architecture

- Computer ScienceArXiv
- 2022

This work proposes a transformer-based approach called SymFormer, which predicts the formula by outputting the individual symbols and the corresponding constants simultaneously simultaneously, which leads to better performance in terms of available data.

### A Symbolic Regression Approach to Hepatocellular Carcinoma Diagnosis Using Hypermethylated CpG Islands in Circulating Cell-Free DNA

- Biology, MedicinemedRxiv
- 2022

A genetic programming-based symbolic regression approach was applied to gain the benefits of machine learning while avoiding the opacity drawbacks of "black box" models and developed an equation utilizing the methylation levels of three biomarkers, with an accuracy of 91.3%, a sensitivity of 100%, and a specificity of 87.5%.

### Identifying interactions in omics data for clinical biomarker discovery using symbolic regression

- BiologybioRxiv
- 2022

A novel symbolic-regression-based algorithm, the QLattice, is presented, which generates parsimonious high-performing models that can both predict disease outcomes and reveal putative disease mechanisms.

### An Explainable Lattice based Fertility Treatment Outcome Prediction Model for TeleFertility

- Computer Science2021 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON)
- 2021

This work deploys Machine Learning and Explainable Artificial Intelligence to predict the outcomes of fertility treatment using interpretable Machine Learning Lattice Models for predictive, preventive and precision reproductive medicine.

## References

SHOWING 1-10 OF 32 REFERENCES

### Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery

- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2021

This article uses a neural network-based architecture for symbolic regression called the equation learner (EQL) network and integrates it with other deep learning architectures such that the whole system can be trained end-to-end through backpropagation.

### Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming

- Computer ScienceIEEE Transactions on Evolutionary Computation
- 2009

Results of the experiments suggest that alternating the order of nonlinearity of GP individuals with their structural complexity produces solutions that are both compact and have smoother response surfaces, and, hence, contributes to better interpretability and understanding.

### A Unified Approach to Interpreting Model Predictions

- Computer ScienceNIPS
- 2017

A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

### Random Forests

- Computer ScienceMachine Learning
- 2004

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

### Discovering Symbolic Models from Deep Learning with Inductive Biases

- Computer ScienceNeurIPS
- 2020

The correct known equations, including force laws and Hamiltonians, can be extracted from the neural network and a new analytic formula is discovered which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures.

### AI Feynman: A physics-inspired method for symbolic regression

- Physics, Computer ScienceScience Advances
- 2020

This work develops a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques and improves the state-of-the-art success rate.

### Empirical modeling using symbolic regression via postfix Genetic Programming

- Computer Science2011 International Conference on Image Information Processing
- 2011

The suitability of Neural Network and symbolic regression via Genetic Programming (GP) to solve empirical modeling problems is explored and it is concluded that symbolic regression through GP can deal efficiently with these problems.

### Toward an artificial intelligence physicist for unsupervised learning.

- Computer SciencePhysical review. E
- 2019

This work proposes a paradigm centered around the learning and manipulation of theories, which parsimoniously predict both aspects of the future and the domain in which these predictions are accurate, and proposes a generalized mean loss to encourage each theory to specialize in its comparatively advantageous domain.

### "Why Should I Trust You?": Explaining the Predictions of Any Classifier

- Computer ScienceHLT-NAACL Demos
- 2016

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

### Learning Symbolic Physics with Graph Networks

- Computer ScienceArXiv
- 2019

An approach for imposing physically motivated inductive biases on graph networks to learn interpretable representations and improved zero-shot generalization is introduced and offers a valuable technique for interpreting and inferring explicit causal theories about the world from implicit knowledge captured by deep learning.