• Corpus ID: 21100604

Towards a more efficient representation of imputation operators in TPOT

  title={Towards a more efficient representation of imputation operators in TPOT},
  author={Unai Garciarena and Alexander Mendiburu and Roberto Santana},
Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can… 

Figures and Tables from this paper

Analysis of the Complexity of the Automatic Pipeline Generation Problem

This paper addresses the pipeline generation problem from a broader perspective, that of problem complexity understanding as a previous step before proposing a solution, and suggests that, depending on the dimensions of the search, the model quality target, and the data being modeled, basic search methods can produce results that match the user's expectations.

AutonoML: Towards an Integrated Framework for Autonomous Machine Learning

This review seeks to motivate a more expansive perspective on what constitutes an automated/autonomous ML system, alongside consideration of how best to consolidate those elements, and develops a conceptual framework to illustrate one possible way of fusing high-level mechanisms into an autonomous ML system.



Evolving imputation strategies for missing data in classification problems with TPOT

It is shown that genetic programming can automatically find increasingly better pipelines that include the most effective combinations of imputation methods, feature pre-processing, and classifiers for a variety of classification problems with missing data.

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

This chapter presents TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task.

An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers

Strongly Typed Genetic Programming

  • D. Montana
  • Computer Science
    Evolutionary Computation
  • 1995
Strongly typed genetic programming (STGP) is an enhanced version of genetic programming that enforces data-type constraints and whose use of generic functions and generic data types makes it more powerful than other approaches to type-constraint enforcement.

An analysis of four missing data treatment methods for supervised learning

This analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperforms the mean or mode imputation method, which is a method broadly used to treatMissing values.

Genetic programming - on the programming of computers by means of natural selection

  • J. Koza
  • Computer Science
    Complex adaptive systems
  • 1993
This book discusses the evolution of architecture, primitive functions, terminals, sufficiency, and closure, and the role of representation and the lens effect in genetic programming.

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing

Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems

In this new "genetic programming" paradigm, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (recombination) operator appropriate for genetically mating computer programs.

Grammar-based Genetic Programming: a survey

This work surveys the various grammar-based formalisms that have been used in GP and discusses the contributions they have made to the progress of GP, showing how grammar formalisms contributed to the solutions of these problems.