A Splitting Criteria Based on Similarity in Decision Tree Learning

  title={A Splitting Criteria Based on Similarity in Decision Tree Learning},
  author={Xinmeng Zhang and Sheng-yi Jiang},
  journal={J. Softw.},
Decision trees are considered to be the most effective and widely used data mining technique for classification, their representation is intuitive and generally easy to be comprehended by humans. The most critical issue in the learning process of decision trees is the splitting criteria. In this paper, We firstly provide the definition of similarity computation that usually used in data clustering and apply it to the learning process of decision trees. Then, we propose a novel splitting… Expand
A Novel Prototype Decision Tree Method Using Sampling Strategy
This paper proposes a decision tree structure which mimics human learning by performing balance of data source to some extent, and proposes a novel method based on sampling strategy which is comparable to state-of-the-art methods. Expand
DTreeSim: A new approach to compute decision tree similarity using re-mining
DTreeSim is proposed, a new approach that applies multiple data mining techniques (classification, sequential pattern mining, and k-nearest neighbors) sequentially to identify similarities among decision trees to compute similarities using two novel measures: general similarity and pieced similarity. Expand
A Novel Decision Tree Framework using Discrete Haar Wavelet Transform
Data Mining is a popular knowledge discovery technique. In data mining decision trees are of the simple and powerful decision making models. One of the limitations in decision trees is towards theExpand
Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification
The goal of this paper is to reduce the classification (inference) complexity of tree ensembles by choosing a single representative model out of ensemble of multiple decision-tree models. We computeExpand
Using similarity-based selection in evolutionary design of decision trees
A novel method of selection is proposed that takes into consideration the similarity of trees in the crossover process, to prevent inbreeding and maintain the diversity of the population over the course of evolution. Expand
International Journal of Recent Technology and Engineering (IJRTE)
A new technique is proposed for splitting categorical data during the process of decision tree learning. This technique is based on the class probability representations and manipulations of theExpand
A new decision tree algorithm IQ Tree for class classification problem in Data Mining
Data mining and knowledge discovery is used for discovery of hidden knowledge from large data sources. Decision trees are one of the most famous classification techniques with simple and efficientExpand
An Efficient Approach for Knowledge Discovery in Decision Trees using Inter Quartile Range Transform
This paper presents a new decision tree algorithm IQ Tree for classification problem that assumes using an inter quartile range conversion of attributes with C4.5 as the base algorithm for performing induction can improve all the measures such as accuracy, tree size. Expand
Interpretable decision-tree induction in a big data parallel framework
The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. Expand
Improved Post Pruning of Decision Trees
This paper evaluated the results of the pruning method on a variety of machine learning data sets from UCI machine learning repository and found that it generates a concise and accurate model. Expand


An Empirical Study on Class Probability Estimates in Decision Tree Learning
This paper provides an empirical study on the classification and ranking performance of the resulting decision trees using different class probability estimation methods and results based on a large number of UCI data sets verify the conclusions. Expand
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria
This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are influenced by misclassification costs or changes to the class distribution. SplittingExpand
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed, but C 4.5 tends to produce trees with twice as many leaves as those fromIND-Cart and QUEST. Expand
Quality Scheme Assessment in the Clustering Process
This paper presents an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set, and selects the best clustering scheme according to a quality index. Expand
Performance Evaluation of Some Clustering Algorithms and Validity Indices
In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validityExpand
Maximizing classifier utility when there are data acquisition and modeling costs
This article analyzes the relationship between the number of acquired training examples and the utility of the data mining process and, given the necessary cost information, determines thenumber of training examples that yields the optimum overall performance. Expand
Flexible decision tree for data stream classification in the presence of concept change, noise and missing values
A novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification, which offers a flexible structure to effectively and efficiently handle concept change and is robust to noise. Expand
Improved Use of Continuous Attributes in C4.5
A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes with an MDL-inspired penalty, leading to smaller decision trees with higher predictive accuracies. Expand
Combining Naive Bayes and Decision Tables
A simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables is investigated and it is shown that the resulting ranker, compared to either component technique, frequently significantly increases AUC. Expand
C4.5: Programs for Machine Learning
A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Expand