Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines

  title={Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines},
  author={Nikolay O. Nikitin and Pavel Vychuzhanin and Mikhail Sarafanov and Iana S. Polonskaia and Ilia Revin and Irina V. Barabanova and Gleb Maximov and Anna V. Kaluzhnaya and Alexander Boukhanovsky},

Evolutionary Automated Machine Learning for Multi-Scale Decomposition and Forecasting of Sensor Time Series

The iterative data decomposition algorithm is proposed in the paper to improve the quality of the sensor time series forecasting and the boosting-like mutation operators have been implemented for graphs-based genotypes.

Improvement of Computational Performance of Evolutionary AutoML in a Heterogeneous Environment

A modular approach that can be used to increase the quality of evolutionary optimization for modelling pipelines with a graph-based structure is proposed that consists of several stages - parallelization, caching and evaluation.

Automated data-driven approach for gap filling in the time series using evolutionary learning

The approach is based on the automated evolutionary identification of the optimal structure for a composite data-driven model and allows adapting the model for the effective gap-filling in a specific dataset without the involvement of the data scientist.

On the balance between the training time and interpretability of neural ODE for time series modelling

The paper shows that modern neural ODE cannot be reduced to simpler models for time-series modelling applications, and proposes a new view on time- series modelling using combined neural networks and ODE systems approach.

Hybrid Bayesian Network-Based Modeling: COVID-19-Pneumonia Case

The proposed approach for predicting important clinical indicators is Bayesian network-based interpretability, which is very important in the medical field, and can be used as part of the decision support systems for improving COVID-19-based pneumonia treatment.

The development of an electrochemical sensor for antibiotics in milk based on machine learning algorithms

A combination of cyclic voltammetry facilities and machine learning technique made it possible to create a pattern recognition system for antibiotic residues in skimmed milk and Gradient boosting algorithm showed the best efficiency towards training the machine learning model.

Machine learning-based wind speed time series analysis

In this study, hourly average wind speed data covering the years 2019, 2020, and 2021 in California were used to perform a time series analysis and forecasting utilizing one of the AutoML tools, Fedot.

MatFlow: A System for Knowledge-based Novel Materials Design using Machine Learning

A new machine learning platform, called MatFlow, is introduced for automated and knowledge driven design of novel materials and their usage and its functionality is illustrated with an application in Transition Metal Dichalcogenide Heterostructures design of electronic and energy devices.



Incremental Search Space Construction for Machine Learning Pipeline Synthesis

A data-centric approach based on meta-features for pipeline construction and hyperparameter optimization inspired by human behavior is proposed, which is able to prune the pipeline structure search space efficiently and flexible and data set specific ML pipelines can be constructed.

DarwinML: A Graph-based Evolutionary Algorithm for Automated Machine Learning

A graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures, and an evolutionary algorithm is proposed to search for the best architecture.

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

This chapter presents TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task.

DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering

This study presents DeepLine, a reinforcement learning-based approach for automatic pipeline generation that utilizes an efficient representation of the search space together with a novel method for operating in environments with large and dynamic action spaces.

Benchmark and Survey of Automated Machine Learning Frameworks

This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets to summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline.

Auto-sklearn: Efficient and Robust Automated Machine Learning

A robust new AutoML system based on the Python machine learning package scikit-learn, which improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.

The data-driven physical-based equations discovery using evolutionary approach

The algorithm for the mathematical equations discovery from the given observations data is described, which combines genetic programming with the sparse regression and results in a short and interpretable expression that describes the physical process that lies beyond the data.

A study of model and hyper-parameter selection strategies for classifier ensembles: a robust analysis on different optimization algorithms and extended results

A wide and robust comparative analysis of both approaches for Classifier Ensembles indicates that the use of a hyper-parameter selection method provides the most accurate classifier ensembles, but this improvement was not detected by the statistical test.

An Adaptive and Near Parameter-free Evolutionary Computation Approach Towards True Automation in AutoML

This work proposes a near parameter-free genetic programming approach, which adapts the hyperparameter values throughout evolution without ever needing to be specified manually, and applies this to the area of automated machine learning, to produce pipelines which can effectively be claimed to be free from human input.