• Corpus ID: 6130737

Practical feature subset selection for machine learning

  title={Practical feature subset selection for machine learning},
  author={Mark A. Hall and Lloyd A. Smith},
Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selectors are algorithms that attempt to identify and remove as much… 

Figures and Tables from this paper

Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper
A new fllter approach to feature selection that uses a correlation based heuristic to evaluate the worth of feature subsets when applied as a data preprocessing step for two common machine learning algorithms.
Comparative Review of Feature Selection and Classification modeling
  • Azhar M.A., Princy Ann Thomas
  • Computer Science
    2019 International Conference on Advances in Computing, Communication and Control (ICAC3)
  • 2019
An overview of some of the classification methods, which are based on some threshold values and benchmark algorithms that determine the optimality of the features in the dataset, is provided.
Extended fast feature selection for classification modeling
This paper analyzes the existing issue, and presents an extended fast feature selection algorithm to overcome the problem, and conducts experiments using real data from financial institutions to demonstrate the improvement in terms of quality of selected features.
Bayes Theorem and Information Gain Based Feature Selection for Maximizing the Performance of Classifiers
After feature selection using the Bayes theorem and Information gain to control false discovery rate, the classification performance of DT’s and NB classifiers were significantly improved.
Conditional Probability Based Feature Selector for Effective Data Classification
After feature selection using the proposed method to control false discovery rate, the classification performance of Decision Tree and Naïve Bayesian classifiers were significantly improved.
Design and analysis of scalable rule induction systems
This thesis introduces a new rule induction algorithm for learning classification rules, which broadly follows the approach of algorithms represented by CN2, and proposes a new search method which employs several novel search-space pruning rules and rule-evaluation techniques which results in a highly efficient algorithm with improved induction performance.
The experimental result shows that the feature selection methods provide better result for breast cancer data set, and the clear insight is provided to different feature selection method reported in the literature and also compares all methods with each other.
A New Filter Feature Selection Based on Criteria Fusion for Gene Microarray Data
A score-based criteria fusion feature selection method (SCF) is proposed for cancer prediction, and it shows superior performance over many well-known feature selection methods when employing two classifiers SVM and KNN to measure the quality of selected features.
Feature Selection Methods in Data Mining Techniques
Basic issues linked with feature selection are outlined and an analysis of five feature selection algorithms belonging to the filter category revealed that the results of each of the five algorithms are acceptable.
Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining
This work demonstrates that the proposed algorithm using generative Naive Bayesian classifier on the average is more efficient than using discriminative models namely Logistic Regression and Support Vector Machine.


Wrappers for Feature Subset Selection
A Practical Approach to Feature Selection
Feature selection via the discovery of simple classification rules
A method to achieve this using a very simple algorithm that gives good performance across different supervised learning schemes and when compared to one of the most common methods for feature subset selection.
Toward Optimal Feature Selection
An efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion is given, showing that the algorithm effectively handles datasets with a very large number of features.
C4.5: Programs for Machine Learning
A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Learning with Many Irrelevant Features
It is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features, and suggests that training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.
Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier
It is shown that the simple Bayesian classi er (SBC) does not in fact assume attribute independence, and can be optimal even when this assumption is violated by a wide margin, and the previously-assumed region of optimality is a second-order in nitesimal fraction of the actual one.
On Biases in Estimating Multi-Valued Attributes
The basics of eleven measures for estimating the quality of the multivalued attributes are analyzed and a new function based on the MDL principle whose value slightly decreases with the increasing number of attributes values is introduced.
Artificial Intelligence
The history, the major landmarks, and some of the controversies in each of these twelve topics are discussed, as well as some predictions about the course of future research.
Applications of machine learning in information retrieval
Revue bibliographique sur la recherche d'information, classification (categorisation), formulation de requete et filtrage