An Analysis of Reduced Error Pruning

@article{Elomaa2001AnAO,
  title={An Analysis of Reduced Error Pruning},
  author={Tapio Elomaa and Matti K{\"a}{\"a}ri{\"a}inen},
  journal={J. Artif. Intell. Res.},
  year={2001},
  volume={15},
  pages={163-187}
}
Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three… 

The Difficulty of Reduced Error Pruning of Leveled Branching Programs

The experiments show that, despite the negative theoretical results, heuristic pruning of branching programs can reduce their size without significantly altering the accuracy, and this result is proved to be APX-hard.

Is Error-Based Pruning Redeemable?

Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees with good accuracy when compared with reduced error pruning.

A novel decision tree classification based on post-pruning with Bayes minimum risk

A post-pruning method that considers various evaluation standards such as attribute selection, accuracy, tree complexity, and time taken to prune the tree, precision/recall scores, TP/FN rates and area under ROC is proposed.

A k-norm pruning algorithm for decision tree classifiers based on error rate estimation

This work applies Lidstone’s Law of Succession for the estimation of the class probabilities and error rates of decision tree classifiers, and proposes an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.

Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.

An analysis of misclassification rates for decision trees

This dissertation focuses on the minimization of the misclassification rate for decision tree classifiers, and proposes an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.

Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

It is shown that, in general, an appropriate setting of the certainty factor for error-based pruning will cause decision tree size to plateau when accuracy is not increasing with more training data.

Experiments with an innovative tree pruning algorithm

This paper demonstrates the experimental results of the comparison among the 2-norm pruning algorithm and two classical pruning algorithms, the Minimal Cost-Complexity algorithm (used in CART) and the Error-based pruninggorithms ( used in C4.5), and confirms that the2-normPruning algorithm is superior in accuracy and speed.

Contribution to Decision Tree Induction with Python: A Review

  • B. Lamrini
  • Computer Science
    Data Mining - Methods, Applications and Systems
  • 2021
This review presents essential steps to understand the fundamental concepts and mathematics behind decision tree from training to building and study criteria and pruning algorithms, which have been proposed to control complexity and optimize decision tree performance.

A new minimum description length based pruning technique for rule induction algorithms

A new pruning technique built on the sound foundation of the minimum description length principle is presented, which is designed to improve the performance of the RULe Extraction System family of inductive learning algorithms, but can be used for pruning rule sets created by other learning algorithms.
...

References

SHOWING 1-10 OF 39 REFERENCES

A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

In this work, we present a new bottom-up algorithm for decision tree pruning that is very e cient (requiring only a single pass through the given tree), and prove a strong performance guarantee for

A Comparative Analysis of Methods for Pruning Decision Trees

A comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation, and an objective evaluation of the tendency to overprune/underprune observed in each method is made.

Predicting Nearly As Well As the Best Pruning of a Decision Tree

This paper presents a new method of making predictions on test data, and proves that the algorithm's performance will not be “much worse” than the predictions made by the best reasonably small pruning of the given decision tree, and is guaranteed to be competitive with any pruning algorithm.

The Effects of Training Set Size on Decision Tree Complexity

This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that

An Efficient Algorithm for Optimal Pruning of Decision Trees

Decision Tree Pruning as a Search in the State Space

The introduction of the state space shows that very simple search strategies are used by the postpruning methods considered, and some empirical results allow theoretical observations on strengths and weaknesses of pruning methods to be better understood.

Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail

This work constructs a statistical model of reduced error pruning that is shown to control tree growth far better than the original algorithm and makes predictions about how to lessen their effects.

An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

An efficient method for maintaining mixtures of prunings of a prediction or decision tree that extends the previous methods for “node-based” pruning to the larger class of edge-based prunments, and it is proved that the algorithm maintains correctly the mixture weights for edge- based prunts with any bounded loss function.

Pruning Decision Trees and Lists

This thesis presents pruning algorithms for decision trees and lists that are based on significance tests and explains why pruning is often necessary to obtain small and accurate models and shows that the performance of standard pruned algorithms can be improved by taking the statistical significance of observations into account.

Overprvning Large Decision Trees

This paper presents empirical evidence for five hypotheses about learning from large noisy domains: that trees built from very large training sets are larger and more accurate than trees built from