• Corpus ID: 236635250

Contemporary Symbolic Regression Methods and their Relative Performance

  title={Contemporary Symbolic Regression Methods and their Relative Performance},
  author={W. L. Cava and Patryk Orzechowski and Bogdan Burlacu and Fabr'icio Olivetti de Francca and M. Virgolin and Ying Jin and Michael Kommenda and Jason H. Moore},
Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. We address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems. Our assessment includes both real-world datasets with no… 

GSR: A Generalized Symbolic Regression Approach

This paper presents GSR, a Generalized Symbolic Regression approach, by modifying the conventional SR optimization problem formulation, while keeping the main SR objective intact, and proposes a genetic programming approach with a matrix-based encoding scheme.

Exhaustive Symbolic Regression

A new method is introduced – Exhaustive Symbolic Regression (ESR) – which systematically and efficiently considers all possible equations and is therefore guaranteed to be not only the true optimum but also a complete function ranking.

Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

This paper revisits datasets and evaluation criteria for Symbolic Regression, a task of recovering mathematical expressions from given data, and proposes to use normalized edit distances between a predicted equation and the ground-truth equation trees as an evaluation metric.

Symbolic Regression is NP-hard

Evidence suggesting that the answer to the question: Is there an exact polynomial-time algorithm to compute SR models is probably negative is provided by showing that SR is NP-hard.

Taylor genetic programming for symbolic regression

This work proposes a new method for SR, called Taylor genetic programming (TaylorGP), which leverages a Taylor polynomial to approximate the symbolic equation that fits the dataset and utilizes the Taylor poynomial to extract the features of the symbolic equations.

End-to-end symbolic regression with transformers

This paper challenges this two-step procedure, and task a Transformer to directly predict the full mathematical expression, constants included, and presents ablations to show that this end-to-end approach yields better results, sometimes even without the refinement step.

Symbolic Expression Transformer: A Computer Vision Approach for Symbolic Regression

This work proposes Symbolic Expression Transformer (SET), a sample-agnostic model from the perspective of computer vision for SR, and demonstrates the effectiveness and suggests the promising direction of image-based model for solving the challenging SR problem.

Interpretability in symbolic regression: a benchmark of explanatory methods using the Feynman data set

A benchmark scheme to evaluate explanatory methods to explain regression models, mainly symbolic regression models and observed that Partial Effects and SHAP were the most robust explanation models, with Integrated Gradients being unstable only with tree-based models.

SciMED: A Computational Framework For Physics-Informed Symbolic Regression with Scientist-In-The-Loop

A novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED), which integrates scientific discipline wisdom in a scientist-in- the-loop approach with state-of-the-art symbolic regression (SR) methods.

Transformation-interaction-rational representation for symbolic regression

  • F. O. de Franca
  • Computer Science
    Proceedings of the Genetic and Evolutionary Computation Conference
  • 2022
An extension to this representation, called Transformation-Interaction-Rational representation, is proposed that defines a new function form as the rational of two Interaction-Transformation functions, and the target variable can also be transformed with an univariate function.



Benchmarking state-of-the-art symbolic regression algorithms

This paper conceptually and experimentally compare several representatives of multiple linear regression algorithms, including GPTIPS, FFX, and EFS, which are applied as off-the-shelf, ready-to-use techniques in the field of SR.

Where are we now?: a large benchmark study of recent symbolic regression methods

The results suggest that symbolic regression performs strongly compared to state-of-the-art gradient boosting algorithms, although in terms of running times is among the slowest of the available methodologies.

FFX: Fast, Scalable, Deterministic Symbolic Regression Technology

A new non-evolutionary technique for symbolic regression that is orders of magnitude faster than competent GP approaches on real-world problems, returns simpler models, has comparable or better prediction on unseen data, and converges reliably and deterministically.

Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

This article shows that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and proposes a method to correct this, and finds that GOMEA is a promising new approach to SR.

PMLB: a large benchmark suite for machine learning evaluation and comparison

It is found that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered.

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

The proposed framework uses a recurrent neural network to emit a distribution over tractable mathematical expressions, and employs reinforcement learning to train the network to generate better-fitting expressions, which significantly outperforms standard genetic programming-based symbolic regression in its ability to exactly recover symbolic expressions.

Pareto-Front Exploitation in Symbolic Regression

This work prefers parsimonious (simple) expressions with the expectation that they are more robust with respect to changes over time in the underlying system or extrapolation outside the range of the data used as the reference in evolving the symbolic regression.

Feature standardisation and coefficient optimisation for effective symbolic regression

It is demonstrated that standardisation allows a simpler function set to be used without increasing bias and can significantly improve the performance of coefficient optimisation through gradient descent to produce accurate models.

AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity. It improves on the

AI Feynman: A physics-inspired method for symbolic regression

This work develops a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques and improves the state-of-the-art success rate.