• Corpus ID: 52105215

P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning

  title={P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning},
  author={Yolanda Gil and Ke-Thia Yao and Varun Ratnakar and Daniel Garijo and Greg Ver Steeg and Rob Brekelmans and Mayank Kejriwal and Fanghao Luo and I-Hui Huang},
While many problems could benefit from recent advances in machine learning, significant time and expertise are required to design customized solutions to each problem. Prior attempts to automate machine learning have focused on generating multi-step solutions composed of primitive steps for feature engineering and modeling, but using already clean and featurized data and carefully curated primitives. However, cleaning and featurization are often the most time-consuming steps in a data science… 

Tables from this paper

Visus: An Interactive System for Automatic Machine Learning Model Building and Curation

Visus is presented, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems and the framework used to ground the design choices and a usage scenario enabled by Visus is described.

On Taking Advantage of Opportunistic Meta-knowledge to Reduce Configuration Spaces for Automated Machine Learning

Overall, numerous experiments with the AutoWeka4MCPS package suggest that (1) opportunistic/systematic meta- knowledge can improve ML outcomes, typically in line with how relevant that meta-knowledge is, and (2) configuration-space culling is optimal when it is neither too conservative nor too radical.

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

A data-centric approach based on meta-features for pipeline construction and hyperparameter optimization inspired by human behavior is proposed, which is able to prune the pipeline structure search space efficiently and flexible and data set specific ML pipelines can be constructed.

Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for Automated Machine Learning

This paper investigates whether, based on previous experience, a pool of available classifiers/regressors can be preemptively culled ahead of initiating a pipeline composition/optimisation process for a new ML problem, i.e. dataset, and indicates that it is better to search through a ‘top tier’ of recommended predictors than to pin hopes onto one previously supreme performer.

Towards A Domain-Customized Automated Machine Learning Framework For Networks and Systems

This paper argues it is possible to build a domain-customized automated ML framework for networked systems that can help save valuable operator time and effort.

Benchmark and Survey of Automated Machine Learning Frameworks

This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets to summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline.

AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines and is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.

Towards human-guided machine learning

This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user's knowledge about the data available.

Perspectives on automated composition of workflows in the life sciences

This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences, and draws the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment.



Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

This paper implements an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and shows that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user.

A Framework for Efficient Data Analytics through Automatic Configuration and Customization of Scientific Workflows

A framework to assist scientists with data analysis tasks in particular machine learning and data mining is developed, which takes advantage of the unique capabilities of the Wings workflow system to reason about semantic constraints.

Automating Biomedical Data Science Through Tree-Based Pipeline Optimization

This work implements a Tree-based Pipeline Optimization Tool (TPOT) and shows that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets.

Probabilistic Matrix Factorization for Automated Machine Learning

This paper uses probabilistic matrix factorization techniques and acquisition functions from Bayesian optimization to identify high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art.

Efficient and Robust Automated Machine Learning

This work introduces a robust new AutoML system based on scikit-learn, which improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.

A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention

This competition contributes to the development of fully automated environments by challenging practitioners to solve problems under speci c constraints and sharing their approaches; the platform will remain available for post-challenge submissions at http://codalab.org/AutoML.

Metalearning - Applications to Data Mining

This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms and shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems.

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.

Intelligent Support for Exploratory Data Analysis

The design of AIDE is described and its behavior in exploring a small, complex data set is described, which gives us a useful means of representing some types of statistical strategy.

Metalearning: a survey of trends and technologies

An all-encompassing overview of the research directions pursued under the umbrella of metalearning is given, reconciling different definitions given in scientific literature, listing the choices involved when designing aMetalearning system and identifying some of the future research challenges in this domain are identified.