Learn More
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the(More)
We review accuracy estimation methods and compare the two most commonmethods cross validation and bootstrap Recent experimen tal results on arti cial data and theoretical re sults in restricted settings have shown that for selecting a good classi er from a set of classi ers model selection ten fold cross validation may be better than the more expensive(More)
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and(More)
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de ning characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based(More)
We address the problem of nding a subset of features that allows a supervised induc tion algorithm to induce small high accuracy concepts We examine notions of relevance and irrelevance and show that the de nitions used in the machine learning literature do not adequately partition the features into useful categories of relevance We present de ni tions for(More)
We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on arti cial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can(More)
We analyze critically the use of classi cation accuracy to compare classi ers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracy for comparing classi ers and draw into question the conclusions that(More)