• Corpus ID: 15010720

Feature Construction Methods : A Survey

  title={Feature Construction Methods : A Survey},
  author={Parikshit Sondhi},
A good feature representation is central to achieving high performance in any machine learning task. However manually defining a good feature set is often not feasible. Feature construction involves transforming a given set of input features to generate a new set of more powerful features which can then used for prediction. Several feature construction methods have been developed. In this paper we present a survey of past 20 years of research in the area. We describe the major issues involved… 

Figures from this paper

Automating Feature Engineering

This paper discusses a system for performing feature engineering in an automated manner using a combination of exploratory and learning techniques, and mentions the larger charter of an automated data science pipeline.

Cognito: Automated Feature Engineering for Supervised Learning

Cognito is a novel system, that performs automatic feature engineering on a given dataset for supervised learning, that explores various feature construction choices in a hierarchical and non-exhaustive manner, while progressively maximizing the accuracy of the model through a greedy exploration strategy.

Consistent Feature Construction with Constrained Genetic Programming for Experimental Physics

This work is the first time a method is proposed for interpretable feature construction with units of measurement, and that experts in high-energy physics validate the overall approach as well as the interpretability of the built features.

Comparative study of classifier performance using automatic feature construction by M3GP

The results show that automatic feature construction with M3GP, when compared to using the standalone classifiers without feature construction, achieves statistically significant improvements in the majority of the test cases, sometimes by a very large margin, while degrading the weighted f-measure in only one out of 48 cases.

Redundancy Is Not Necessarily Detrimental in Classification Problems

This work develops a theoretical framework to analyze feature construction and selection, shows that certain properly defined features are redundant but make the data linearly separable, and proposes a formal criterion to validate feature construction methods.

AEFE: Automatic Embedded Feature Engineering for Categorical Features

Automatic Embedded Feature Engineering is proposed, an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection and outperforms the classical machine learning models and state-of-the-art deep learning models.

Prior Knowledge Neural Network for Automatic Feature Construction in Financial Time Series

A new method, alpha discovery neural network, which can automatically construct features by using neural network is proposed, and it is shown that ADN can produce more diversified and higher informative features than GP.

Benchmark and Survey of Automated Machine Learning Frameworks

This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets to summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline.

AutoML: A Survey of the State-of-the-Art

Embedded Constrained Feature Construction for High-Energy Physics Data Classification

A general framework to embed a feature construction technique adapted to the constraints of high-energy physics in the induction of tree-based models is proposed, which is built to be interpretable, the whole model is transparent and readable.



Explanation-Based Feature Construction

This work describes an approach to feature construction where task-relevant discriminative features are automatically constructed, guided by an explanation-based interaction of training examples and prior domain knowledge and shows that in the challenging task of distinguishing handwritten Chinese characters, the automatic feature-construction approach performs particularly well.

Selection of Relevant Features and Examples in Machine Learning

Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks

  • K. Krawiec
  • Computer Science
    Genetic Programming and Evolvable Machines
  • 2004
The extended approach proposed in the paper proved to be able to outperform the standard approach on some benchmark problems on a statistically significant level and to show that classifiers induced using the representation enriched by the GP-constructed features provide better accuracy of classification on the test set.

Feature Generation Using General Constructor Functions

A generalized and flexible framework that is capable of generating features from any given set of constructor functions, and was applied to a variety of classification problems and was able to generate features that were strongly related to the underlying target concepts.

Constructive Induction On Decision Trees

A definition of feature construction in concept learning is presented, and a framework for its study is offered based on four aspects: detection, selection, generalization, and evaluation.

An Introduction to Variable and Feature Selection

The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

Learning from labeled features using generalized expectation criteria

This paper proposes a method for training discriminative probabilistic models with labeled features and unlabeled instances and expresses soft constraints using generalized expectation (GE) criteria terms in a parameter estimation objective function that express preferences on values of a model expectation.

An interactive algorithm for asking and incorporating feature feedback into support vector machines

An algorithm for tandem learning that begins with a couple of labeled instances, and then at each iteration recommends features and instances for a human to label results in much better performance than learning on only features or only instances.