On genetic programming representations and fitness functions for interpretable dimensionality reduction

  title={On genetic programming representations and fitness functions for interpretable dimensionality reduction},
  author={Thomas Uriot and M. Virgolin and Tanja Alderliesten and Peter A. N. Bosman},
  journal={Proceedings of the Genetic and Evolutionary Computation Conference},
Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpreted (e.g., kernel PCA, neural-based autoencoder). Recently genetic programming (GP) has been used to evolve interpretable DR mappings in the form of… 

Figures and Tables from this paper



Nonlinear Component Analysis as a Kernel Eigenvalue Problem

A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.

Genetic Programming for Evolving a Front of Interpretable Models for Data Visualization

A genetic programming (GP) approach called GP-tSNE is proposed for evolving interpretable mappings from the dataset to high-quality visualizations and a multiobjective approach is designed that produces a variety of visualizations in a single run which gives different tradeoffs between visual quality and model complexity.

Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

This paper substantially extends previous work on manifold learning, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality.

Can Genetic Programming Do Manifold Learning Too?

A genetic programming approach to manifold learning called GP-MaL is proposed which evolves functional mappings from a high-dimensional space to a lower dimensional space through the use of interpretable trees and is competitive with existing manifold learning algorithms.

Multitree Genetic Programming With New Operators for Transfer Learning in Symbolic Regression With Incomplete Data

This work proposes a new multitree GP-based feature construction approach to TL in symbolic regression with missing values that achieves better performance compared with different traditional learning methods but also advances two recent TL methods on real-world data sets with various incompleteness and learning scenarios.

Contemporary Symbolic Regression Methods and their Relative Performance

An open-source, reproducible benchmarking platform for symbolic regression is introduced and it is concluded that the best performing methods for real-world regression combine genetic algorithms with parameter estimation and/or semantic search drivers.

On sampling error in genetic programming

This paper presents a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half, and presents a model that estimates the sampling error for a given GP population size.

Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization

This work provides several unexpected insights into what design choices to make and avoid when constructing DR algorithms, and designs a new algorithm, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure.

Evolving Simpler Constructed Features for Clustering Problems with Genetic Programming

The results of experiments show that parsimony pressure is an effective method for producing significantly simpler constructed features without any reduction on the performance of k-means++clustering.

Discovering Symbolic Models from Deep Learning with Inductive Biases

The correct known equations, including force laws and Hamiltonians, can be extracted from the neural network and a new analytic formula is discovered which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures.