Efficient Construction of Decision Trees by the Dual Information Distance Method

@article{BenGal2014EfficientCO,
  title={Efficient Construction of Decision Trees by the Dual Information Distance Method},
  author={Irad Ben-Gal and Alexandra Dana and Niv Shkolnik and Gonen Singer},
  journal={Quality Technology \& Quantitative Management},
  year={2014},
  volume={11},
  pages={133 - 147}
}
Abstract The construction of efficient decision and classification trees is a fundamental task in Big Data analytics which is known to be NP-hard. Accordingly, many greedy heuristics were suggested for the construction of decision-trees, but were found to result in local-optimum solutions. In this work we present the dual information distance (DID) method for efficient construction of decision trees that is computationally attractive, yet relatively robust to noise. The DID heuristic selects… 
Improving decision trees by Tsallis Entropy Information Metric method
TLDR
A novel Tsallis Entropy Information Metric (TEIM) algorithm with a new split criterion and a new construction method of decision trees that yields statistically significantly better decision trees in classification accuracy as well as tree complexity.
Improving Decision Trees Using Tsallis Entropy
TLDR
A Tsallis Entropy Criterion algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees and results indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity.
Parallel construction of decision trees with consistently non-increasing expected number oftests
TLDR
This work proposes the 'save favorable general optimal testing algorithm' SF-GOTA that guarantees, unlike conventional look-ahead DT algorithms, the construction of DTs with monotonic non-increasing ENT.
Subspace Analysis in Multi-Class Datasets: An Application to Novelty Detection Ensembles
  • M. Bacher, E. Shmueli, I. Ben-Gal
  • Computer Science
    2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)
  • 2018
TLDR
The proposed Multi-Class Agglomerative Attribute Grouping (MAAG) aims at facilitating the merging of objectives for classification and novelty detection simultaneously, and outperforms on average, other state-of-the-art subspace analysis approaches applied to each individual class in multi-class settings.
Subspace selection for anomaly detection: An information theory approach
TLDR
This work presents a novel subspace selection algorithm that uses the Rokhlin metric to evaluate the smallest information distance in the case of two attributes, and an extension of the ROKhlin distance in cases where more than two attributes are involved.
DEVELOPMENT OF METHOD FOR IDENTIFICATION THE COMPUTER SYSTEM STATE BASED ON THE DECISION TREE WITH MULTIDIMENSIONAL NODES
  • Udc
  • Computer Science
  • 2022
TLDR
The carried out experiments have confirmed the efficiency of the proposed method for constructing a decision tree, which makes it possible to recommend it for practical use in order to improve the accuracy of identifying the state of a computer system.
A comparison of a gradient boosting decision tree, random forests, and artificial neural networks to model urban land use changes: the case of the Seoul metropolitan area
  • M. Jun
  • Environmental Science
    Int. J. Geogr. Inf. Sci.
  • 2021
TLDR
The results of this study showed that GBDT and RF have higher predictive power than ANN, indicating that tree-based ensemble methods are an effective technique for LUC prediction.
Ensemble-Bayesian SPC: Multi-mode process monitoring for novelty detection
TLDR
Performance evaluation on real datasets from both public repositories and real-world semiconductor datasets shows that the EB-SPC outperforms both conventional multivariate SPC as well as ensemble-of-classifiers methods and has a high potential for novelty detection including the monitoring of multimode systems.
INTELLIGENT DATA ANALYSIS IN BIOMEDICAL RESEARCH: CLASSIFICATION TREES
Modern analytical tasks in biomedical research require increasingly sophisticated methods of data analysis. In recent years, the term data analysis is not only related to classical statistical tests
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Decision Tree Induction: How Effective is the Greedy Heuristic?
TLDR
This work quantifies the goodness of greedy tree induction empirically, using the popular decision tree algorithms C4.5 and CART, and shows that the expected classification cost of a greedily induced tree is consistently very close to that of the optimal tree.
A Distance-Based Attribute Selection Measure for Decision Tree Induction
TLDR
A new attribute selection measure for ID3-like inductive algorithms based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node.
Evaluation of gene-expression clustering via mutual information distance measure
TLDR
It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions.
Support-Vector Networks
TLDR
High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
An Informational Search for a Moving Target
  • E. Kagan, I. Ben-Gal
  • Computer Science
    2006 IEEE 24th Convention of Electrical & Electronics Engineers in Israel
  • 2006
TLDR
This work suggests the informational learning real-time algorithm and the informational moving target search algorithm running on a states space with informational metric and presents the results of simulative trials in comparison with the greedy probabilistic search procedure.
Real-Time Heuristic Search
  • R. Korf
  • Computer Science
    Artif. Intell.
  • 1990
The Design and Analysis of Experiments
TLDR
This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Introduction to Ergodic theory
Hyperbolic dynamics studies the iteration of maps on sets with some type of Lipschitz structure used to measure distance. In a hyperbolic system, some directions are uniformly contracted and others
LECTURES ON THE ENTROPY THEORY OF MEASURE-PRESERVING TRANSFORMATIONS
CONTENTSIntroduction § 1. Preliminaries from measure theory § 2. Isometric operators § 3. Measure-preserving transformations § 4. Entropy of a measurable partition § 5. Mean conditional entropy § 6.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction
TLDR
A novel, promising approach that allows greedy decision tree induction algorithms to handle problematic functions such as parity functions, and is effective with only modest amounts of data for problematic functions or subfunctions of up to six or seven variables.
...
...