Unbiased Recursive Partitioning: A Conditional Inference Framework

@article{Hothorn2006UnbiasedRP,
  title={Unbiased Recursive Partitioning: A Conditional Inference Framework},
  author={Torsten Hothorn and Kurt Hornik and Achim Zeileis},
  journal={Journal of Computational and Graphical Statistics},
  year={2006},
  volume={15},
  pages={651 - 674}
}
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases… 
An alternative pruning based approach to unbiased recursive partitioning
The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE.
TLDR
It is assessed how different building blocks from this common framework affect the power of the algorithms to select the appropriate covariates for splitting: observation-wise goodness-of-fit measure, residuals vs. model scores, dichotomization of residuals/scores at zero, and binning of possible split variables.
Tree-Structured Modelling of Categorical Predictors in Regression
TLDR
The method proposed here focusses on the main effects of categorical predictors by using tree type methods to obtain clusters by letting other variables have a linear or additive effect on the response.
Sequential feature selection and inference using multi‐variate random forests
TLDR
This work uses the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing and the resulting sequential algorithm offers an inferentially justifiable, but model‐free, variable selection procedure.
Gaining insight with recursive partitioning of generalized linear models
TLDR
This work presents an approach that combines generalized linear models (GLM) with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's influence on a parametric model.
Transformation Forests
TLDR
A novel approach based on a parametric family of distributions characterised by their transformation function is proposed, which allows broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way.
Recursive partitioning on incomplete data using surrogate decisions and multiple imputation
Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data
TLDR
Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors.
UNBIASED RECURSIVE PARTITIONING ALGORITHM IN REGRESSION TREES
TLDR
The main aim of this research is to apply in regression trees unbiased recursive partitioning algorithm proposed by Hothom, Homik and Zeileis (2006), which is based on permutation tests.
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
TLDR
The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application.
...
...

References

SHOWING 1-10 OF 64 REFERENCES
REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION
TLDR
The proposed algorithm, GUIDE, is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure and allows fast computation speed, natural ex- tension to data sets with categorical variables, and direct detection of local two- variable interactions.
SPLIT SELECTION METHODS FOR CLASSIFICATION TREES
TLDR
This article presents an algorithm called QUEST that has negligible bias, which shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning.
An Exact Probability Metric for Decision Tree Splitting
TLDR
Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases.
Using a Permutation Test for Attribute Selection in Decision Trees
TLDR
This work describes how permutation tests can be applied to the problem of attribute selection in decision trees, and gives a novel two-stage method for applying it to select attributes in a decision tree.
An Exact Probability Metric for Decision Tree Splitting and Stopping
TLDR
Empirical results show that hypergeometric pre-pruning should be done in most cases, as trees pruned in this way are simpler and more efficient, and typically no less accurate than unpruned or post-pruned trees.
Tree-Structured Classification via Generalized Discriminant Analysis.
TLDR
A new method of tree-structured classification is obtained by recursive application of linear discriminant analysis, with the variables at each stage being appropriately chosen according to the data and the type of splits desired.
A Lego System for Conditional Inference
TLDR
This article reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the “classical” test procedures.
A note on split selection bias in classification trees
Tree-based multivariate regression and density estimation with right-censored data
Maximally selected rank statistics
A common statistical problem is the assessment of the predictive power of a quantitative variable for some dependent variable. A maximally selected rank statistic regarding the quantitative variable
...
...