Efficient End-to-End AutoML via Scalable Search Space Decomposition

  title={Efficient End-to-End AutoML via Scalable Search Space Decomposition},
  author={Yang Li and Yu Shen and Wentao Zhang and Ce Zhang and Bin Cui},
End-to-end AutoML has attracted intensive interests from both academia and industry which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning. Existing AutoML systems, however, suffer from scalability issues when applying to application domains with large, high-dimensional search spaces. We present VolcanoML , a scalable and extensible framework that facilitates systematic explo-ration of large AutoML search… 



VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition

End-to-end AutoML has attracted intensive interests from both academia and industry, which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/-model

Volcano - An Extensible and Parallel Query Evaluation System

  • G. Graefe
  • Computer Science
    IEEE Trans. Knowl. Data Eng.
  • 1994
Volcano is the first implemented query execution engine that effectively combines extensibility and parallelism, and is extensible with new operators, algorithms, data types, and type-specific methods.

SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

SystemDS is introduced, an open source ML system for the end-to-end data science lifecycle from data integration, cleaning, and preparation, over local, distributed, and federated ML model training, to debugging and serving, and preliminary results that show the potential of end- to-end lifecycle optimization.

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

This chapter presents TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification task.

The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development

The Machine Learning Bazaar is introduced, a new framework for developing machine learning and automated machine learning software systems that provides solutions to a variety of data modalities and problem types and pair these pipelines with a hierarchy of AutoML strategies - Bayesian optimization and bandit learning.

An ADMM Based Framework for AutoML Pipeline Configuration

A novel AutoML scheme is proposed by leveraging the alternating direction method of multipliers (ADMM) to decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories.

Sequential Model-Based Optimization for General Algorithm Configuration

This paper extends the explicit regression models paradigm for the first time to general algorithm configuration problems, allowing many categorical parameters and optimization for sets of instances, and yields state-of-the-art performance.

Complaint-driven Training Data Debugging for Query 2.0

This work proposes Rain, a complaint-driven training data debugging system that allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved.

Towards Dynamic and Safe Configuration Tuning for Cloud Databases

OnlineTune is proposed, which tunes the online databases safely in changing cloud environments and incorporates the environmental factors as context feature and adopts contextual Bayesian Optimization with context space partition to optimize the database adaptively and scalably.

DeepDive: Declarative Knowledge Base Construction

DeepDive is described, a system that combines database and machine learning ideas to help develop KBC systems, a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration.