An Exact Probability Metric for Decision Tree Splitting and Stopping


ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated for by C4.5's gain ratio. Several alternatives have been proposed and are examined here (distance, orthogonality, a Beta function, and two chi-squared tests). All of these metrics are biased towards splits with smaller… (More)
DOI: 10.1023/A:1007367629006


13 Figures and Tables

