## Learning classi cation trees

- W. Buntine
- In D.J. Hand (Ed),
- 1993

2 Excerpts

- Published 1995 in AISTATS

One of the most popular and enduring paradigms in the intersection of machine-learning and computational statistics is the use of recursive-partitioning or \tree-structured" methods to \learn" classiication trees from data sets (Buntine, 1993; Quinlan, 1986). This approach applies to independent variables of all scale types (binary, categorical, ordered categorical, and continuous) and to noisy as well as to noiseless training sets. It produces classiication trees that can readily be reexpressed as sets of expert systems rules (with each conjunction of literals corresponding to a set of values for variables along one branch through the tree). Each such rule produces a probability vector for the possible classes (or dependent variable values) that the object being classiied may have, thus automatically presenting conndence and uncertainty information about its conclusions. Classiication trees can be validated by methods such as cross-validation (Breiman et al., 1984), and they can easily be modiied to handle missing data by constructing rules that exploit only the information contained in the observed variables. Despite these powerful advantages, classiication tree technology, as implemented in commercially available software systems, is often more useful for pattern recognition than for decision support. Practical business and engineering decisions require some new considerations to be incorporated into the recursive partitioning paradigm. The most important ones include (i) Costs of information acquisition (Cox and Qiu, 1994). If a classiication tree requires tests that are infeasible or that are too expensive to perform in practice, then it will be rejected by practitioners no matter how well it performs on training samples. (ii) Ability to make changes based on the tree. If the best tree for predicting a response uses information (e.g., about time-varying covariates) that does not become available in practice until after key decisions must be made, then the inferences supported by the tree, while perhaps very valuable for scientiic research purposes, will not be suitable for real-time decision support and guidance of actions. An example from the domain of cancer risk prediction is as follows. The best predictor of liver carcinomas in mice exposed to certain chemicals turns out to be the presence of liver adenomas. This is useful to scientists studying the relation between benign and malignant tumors, but it is useless for 1996 Springer-Verlag.

@inproceedings{Cox1995UsingCK,
title={Using Causal Knowledge to Learn More Useful Decision Rules From Data},
author={Louis Anthony Cox},
booktitle={AISTATS},
year={1995}
}