• Publications
  • Influence
Wrappers for Feature Subset Selection
Abstract In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest.Expand
  • 7,464
  • 421
  • Open Access
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
We review accuracy estimation methods and compare the two most common methods crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical re cults in restrictedExpand
  • 9,486
  • 365
  • Open Access
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-worldExpand
  • 2,312
  • 137
  • Open Access
Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid
Naive-Bayes induction algorithms were previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated.Expand
  • 1,246
  • 125
  • Open Access
Supervised and Unsupervised Discretization of Continuous Features
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of theExpand
  • 2,012
  • 107
  • Open Access
Irrelevant Features and the Subset Selection Problem
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and showExpand
  • 2,498
  • 94
  • Open Access
The Power of Decision Tables
We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy toExpand
  • 752
  • 61
  • Open Access
Controlled experiments on the web: survey and practical guide
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests,Expand
  • 527
  • 53
  • Open Access
The Case against Accuracy Estimation for Comparing Induction Algorithms
We analyze critically the use of classi cation accuracy to compare classi ers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, andExpand
  • 1,130
  • 44
  • Open Access
Bias Plus Variance Decomposition for Zero-One Loss Functions
We present a bias variance decomposition of expected misclassi cation rate the most commonly used loss function in supervised classi cation learning The bias variance decomposition for quadratic lossExpand
  • 644
  • 44
  • Open Access