Preprocessor Selection for Machine Learning Pipelines
@article{Schoenfeld2018PreprocessorSF, title={Preprocessor Selection for Machine Learning Pipelines}, author={Brandon Schoenfeld and Christophe G. Giraud-Carrier and Mason Poggemann and Jarom Christensen and Kevin D. Seppi}, journal={ArXiv}, year={2018}, volume={abs/1810.09942} }
Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications require not only model building, but also data preprocessing. In other words, practical solutions consist of pipelines of machine learning operators rather than single algorithms. Interestingly, our experiments suggest that, on average, data preprocessing…
9 Citations
Benchmark and Survey of Automated Machine Learning Frameworks
- Computer ScienceJ. Artif. Intell. Res.
- 2021
This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets to summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline.
Survey on Automated Machine Learning
- Computer ScienceArXiv
- 2019
This survey summarizes the recent developments in academy and industry regarding AutoML and introduces a holistic problem formulation, approaches for solving various subproblems of AutoML, and provides an extensive empirical evaluation of the presented approaches on synthetic and real data.
Extended Pre-Processing Pipeline For Text Classification: On the Role of Meta-Features, Sparsification and Selective Sampling
- Computer ScienceAnais Estendidos do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2021)
- 2021
This Master Thesis introduces three new steps into the traditional pre-processing phase of pipelines for Text Classification: 1) Meta-Features Generation; 2) Sparsification; and 3) Selective Sampling.
AutonoML: Towards an Integrated Framework for Autonomous Machine Learning
- Computer ScienceArXiv
- 2020
This review seeks to motivate a more expansive perspective on what constitutes an automated/autonomous ML system, alongside consideration of how best to consolidate those elements, and develops a conceptual framework to illustrate one possible way of fusing high-level mechanisms into an autonomous ML system.
Novel authorship verification model for social media accounts compromised by a human
- Computer ScienceMultimedia Tools and Applications
- 2021
An authorship verification model that uses XGBoost, as a preprocessor, to discover functional features of the text message, which ranked using MCDM methods to build a classification model.
Using learning analytics to support students’ engineering design: the angle of prediction
- EducationInteractive Learning Environments
- 2019
This research presents a novel, scalable, scalable and scalable approaches that can be used to improve the quality of teaching and learning in the rapidly changing environment of online education.
Correction to: Novel authorship verification model for social media accounts compromised by a human
- MathematicsMultimedia Tools and Applications
- 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-10617-5
Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling
- Computer ScienceInf. Process. Manag.
- 2020
A universal information theoretic approach to the identification of stopwords
- Computer ScienceNat. Mach. Intell.
- 2019
This work formulates an information theoretic framework that automatically identifies uninformative words in a corpus and shows that it not only outperforms other stopword heuristics, but also allows for a substantial reduction of document size in applications of topic modelling.
References
SHOWING 1-7 OF 7 REFERENCES
Layered TPOT: Speeding up Tree-based Pipeline Optimization
- Computer ScienceAutoML@PKDD/ECML
- 2017
This work introduces Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time, using a modified evolutionary algorithm.
Efficient and Robust Automated Machine Learning
- Computer ScienceNIPS
- 2015
This work introduces a robust new AutoML system based on scikit-learn, which improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.
Metalearning - Applications to Data Mining
- Computer ScienceCognitive Technologies
- 2009
This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms and shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems.
Scikit-learn: Machine Learning in Python
- Computer ScienceJ. Mach. Learn. Res.
- 2011
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing…
A Comprehensive Dataset for Evaluating Approaches of Various Meta-learning Tasks
- Computer ScienceICPRAM
- 2012
This paper presents a novel and publicly available dataset for meta-learning based on 83 datasets, six classification algorithms, and 49 meta-features based on which different target variables like accuracy and training time of the classifiers as well as parameter dependent measures are included as ground-truth information.
OpenML: networked science in machine learning
- Computer ScienceSKDD
- 2014
This paper introduces OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems.
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA
- Computer ScienceJ. Mach. Learn. Res.
- 2017
The new version of Auto-WEKA is described, a system designed to help novice users by automatically searching through the joint space of WEKA's learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method.