An Analysis of Reduced Error Pruning

@article{Elomaa2001AnAO,
  title={An Analysis of Reduced Error Pruning},
  author={Tapio Elomaa and Matti K{\"a}{\"a}ri{\"a}inen},
  journal={J. Artif. Intell. Res.},
  year={2001},
  volume={15},
  pages={163-187}
}
Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three… 

The Difficulty of Reduced Error Pruning of Leveled Branching Programs

The experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy, and this result is proved to be APX-hard.

A novel decision tree classification based on post-pruning with Bayes minimum risk

A post-pruning method that considers various evaluation standards such as attribute selection, accuracy, tree complexity, and time taken to prune the tree, precision/recall scores, TP/FN rates and area under ROC is proposed.

A k-norm pruning algorithm for decision tree classifiers based on error rate estimation

This work applies Lidstone’s Law of Succession for the estimation of the class probabilities and error rates of decision tree classifiers, and proposes an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.

Three New MDL-Based Pruning Techniques for Robust Rule Induction

This paper presents three new techniques using the MDL principle for pruning rule sets and shows that the new techniques, when incorporated into a rule induction algorithm, are more efficient and lead to accurate rule sets that are significantly smaller in size compared with the case before pruning.

Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.

An analysis of misclassification rates for decision trees

This dissertation focuses on the minimization of the misclassification rate for decision tree classifiers, and proposes an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.

Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

It is shown that, in general, an appropriate setting of the certainty factor for error-based pruning will cause decision tree size to plateau when accuracy is not increasing with more training data.

A new minimum description length based pruning technique for rule induction algorithms

A new pruning technique built on the sound foundation of the minimum description length principle is presented, which is designed to improve the performance of the RULe Extraction System family of inductive learning algorithms, but can be used for pruning rule sets created by other learning algorithms.

Induction of classification rules by Gini-index based rule generation

...

References

SHOWING 1-10 OF 39 REFERENCES

A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

In this work, we present a new bottom-up algorithm for decision tree pruning that is very e cient (requiring only a single pass through the given tree), and prove a strong performance guarantee for

A Comparative Analysis of Methods for Pruning Decision Trees

A comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation, and an objective evaluation of the tendency to overprune/underprune observed in each method is made.

Predicting Nearly As Well As the Best Pruning of a Decision Tree

This paper presents a new method of making predictions on test data, and proves that the algorithm's performance will not be “much worse” than the predictions made by the best reasonably small pruning of the given decision tree, and is guaranteed to be competitive with any pruning algorithm.

The Effects of Training Set Size on Decision Tree Complexity

This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that

An Efficient Algorithm for Optimal Pruning of Decision Trees

Decision Tree Pruning as a Search in the State Space

The introduction of the state space shows that very simple search strategies are used by the postpruning methods considered, and some empirical results allow theoretical observations on strengths and weaknesses of pruning methods to be better understood.

Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail

This work constructs a statistical model of reduced error pruning that is shown to control tree growth far better than the original algorithm and makes predictions about how to lessen their effects.

An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

An efficient method for maintaining mixtures of prunings of a prediction or decision tree that extends the previous methods for “node-based” pruning to the larger class of edge-based prunments, and it is proved that the algorithm maintains correctly the mixture weights for edge- based prunts with any bounded loss function.

On Estimating Probabilities in Tree Pruning

The resulting pruning method improves on the original Niblett-Bratko pruning in the following respects: apriori probabilities can be incorporated into error estimation, several trees pruned to various degrees can be generated, and the degree of pruning is not affected by the number of classes.

Pruning Decision Trees and Lists

This thesis presents pruning algorithms for decision trees and lists that are based on significance tests and explains why pruning is often necessary to obtain small and accurate models and shows that the performance of standard pruned algorithms can be improved by taking the statistical significance of observations into account.