• Publications
  • Influence
Wrappers for Feature Subset Selection
The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection. Expand
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds. Expand
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Expand
Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid
  • R. Kohavi
  • Mathematics, Computer Science
  • KDD
  • 2 August 1996
A new algorithm, NBTree, is proposed, which induces a hybrid of decision-tree classifiers and Naive-Bayes classifiers: the decision-Tree nodes contain univariate splits as regular decision-trees, but the leaves contain Naïve-Bayesian classifiers. Expand
Supervised and Unsupervised Discretization of Continuous Features
Binning, an unsupervised discretization method, is compared to entropy-based and purity-based methods, which are supervised algorithms, and it is found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy- based method. Expand
Irrelevant Features and the Subset Selection Problem
A method for feature subset selection using cross-validation that is applicable to any induction algorithm is described, and experiments conducted with ID3 and C4.5 on artificial and real datasets are discussed. Expand
The Power of Decision Tables
Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform state-of-the-art algorithms such as C4.5. Expand
Controlled experiments on the web: survey and practical guide
This work provides a practical guide to conducting online experiments, and shares key lessons that will help practitioners in running trustworthy controlled experiments, including statistical power, sample size, and techniques for variance reduction. Expand
The Case against Accuracy Estimation for Comparing Induction Algorithms
This work describes and demonstrates what it believes to be the proper use of ROC analysis for comparative studies in machine learning research, and argues that this methodology is preferable both for making practical choices and for drawing conclusions. Expand