# A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors

@article{Ho2002ADC, title={A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors}, author={Tin Kam Ho}, journal={Pattern Analysis \& Applications}, year={2002}, volume={5}, pages={102-112} }

Abstract: Using a number of measures for characterising the complexity of classification problems, we studied the comparative advantages of two methods for constructing decision forests – bootstrapping and random subspaces. We investigated a collection of 392 two-class problems from the UCI depository, and observed that there are strong correlations between the classifier accuracies and measures of length of class boundaries, thickness of the class manifolds, and nonlinearities of decision…

## 171 Citations

How Complex Is Your Classification Problem?

- Computer ScienceACM Comput. Surv.
- 2019

Detailed descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available that can be used to characterize the complexity of the respective classification problems.

Classifier Domains of Competence in Data Complexity Space

- Computer Science, Mathematics
- 2006

The domain of competence of a set of popular classifiers is studied, by means of a methodology that relates the classifier’s behavior to problem complexity, to identify the features of a classification task that are most relevant in optimal classifier selection.

How Complex is your classification problem? A survey on measuring classification complexity

- Computer Science
- 2018

This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems and implements a set of complexity measures that are implemented on an R package named Extended Complexity Library (ECoL).

Domain of competence of XCS classifier system in complexity measurement space

- Computer ScienceIEEE Transactions on Evolutionary Computation
- 2005

This paper investigates the domain of competence of XCS by means of a methodology that characterizes the complexity of a classification problem by a set of geometrical descriptors, and focuses on XCS with hyperrectangle codification, which has been predominantly used for real-attributed domains.

A meta-learning framework for pattern classification by means of data complexity measures

- Computer ScienceInteligencia Artif.
- 2006

This paper presents a general meta-learning framework based on a number of data complexity measures and discusses the applicability of this method to several problems in pattern analysis.

A review of data complexity measures and their applicability to pattern classification problems

- Computer Science
- 2005

A review of data complexity measures in the framework of pattern classification and possible applications to a number of practical problems is presented.

Measures of Geometrical Complexity in Classification Problems

- Computer Science
- 2006

This work proposes to address the mystery of why popular classifiers fail to perform to perfect accuracy in a practical application by developing measures of geometrical and topological characteristics of point sets in high-dimensional spaces.

On classifier domains of competence

- Computer ScienceProceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.
- 2004

It is observed that the simplest classifiers, nearest neighbor and linear classifier, have extreme behavior of being the best for the easiest and the most difficult problems respectively, while the sophisticated ensemble classifiers tend to be robust for wider types of problems and are largely equivalent in performance.

On classifier domains of competence

- Computer ScienceICPR 2004
- 2004

It is observed that the simplest classifiers, nearest neighbor and linear classifier, have extreme behavior of being the best for the easiest and the most difficult problems respectively, while the sophisticated ensemble classifiers tend to be robust for wider types of problems and are largely equivalent in performance.

Selective Ensemble Based on Transformation of Classifiers Used SPCA

- Computer ScienceInt. J. Pattern Recognit. Artif. Intell.
- 2015

A new ensemble method is proposed that selecting classifiers to ensemble via the transformation of individual classifiers based on diversity and accuracy obtains the better performance than other methods, and the kappa-error diagrams illustrate that the proposed method enhances the diversity compared against other methods.

## References

SHOWING 1-10 OF 34 REFERENCES

The Random Subspace Method for Constructing Decision Forests

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 1998

A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.

Complexity Measures of Supervised Classification Problems

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 2002

A set of real-world problems to random labelings of points is compared and it is found that real problems contain structures in this measurement space that are significantly different from the random sets.

Measuring the complexity of classification problems

- Computer ScienceProceedings 15th International Conference on Pattern Recognition. ICPR-2000
- 2000

A set of real world problems is compared to random combinations of points in this measurement space and it is found that real problems contain structures that are significantly different from the random sets.

Considerations of sample and feature size

- Computer ScienceIEEE Trans. Inf. Theory
- 1972

The design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class (N) and dimensionality (L) and is demonstrated to be an extremely biased estimate of either the Bayes or test- set error rate.

Decision Combination in Multiple Classifier Systems

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 1994

This work proposes three methods based on the highest rank, the Borda count, and logistic regression for class set reranking that have been tested in applications of degraded machine-printed characters and works from large lexicons, resulting in substantial improvement in overall correctness.

The learning behavior of single neuron classifiers on linearly separable or nonseparable input

- Computer ScienceIJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)
- 1999

This work explores the behavior of several classical descent procedures for determining linear separability and training linear classifiers in the presence of linearly non separated input and finds that the adaptive procedures have serious implementation problems which make them less preferable than linear programming.

A System for Induction of Oblique Decision Trees

- Computer ScienceJ. Artif. Intell. Res.
- 1994

This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree.

On dimensionality and sample size in statistical pattern classification

- Computer SciencePattern Recognit.
- 1971

An alternative method of stochastic discrimination with applications to pattern recognition

- Computer Science
- 1995

This dissertation introduces an alternative method of performing stochastic discrimination in pattern recognition which differs in several aspects from the original method introduced by Kleinberg, and discusses four variations of the method, each of which uses different variations of Ho's discriminant functions.

Bounds on the number of samples needed for neural learning

- Computer ScienceIEEE Trans. Neural Networks
- 1991

It is shown that Omega(min (d,n) M) boundary samples are required for successful classification of M clusters of samples using a two-hidden-layer neural network with d-dimensional inputs and n nodes in the first hidden layer.