A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors

@article{Ho2002ADC,
  title={A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors},
  author={Tin Kam Ho},
  journal={Pattern Analysis \& Applications},
  year={2002},
  volume={5},
  pages={102-112}
}
  • T. Ho
  • Published 7 June 2002
  • Mathematics
  • Pattern Analysis & Applications
Abstract: Using a number of measures for characterising the complexity of classification problems, we studied the comparative advantages of two methods for constructing decision forests – bootstrapping and random subspaces. We investigated a collection of 392 two-class problems from the UCI depository, and observed that there are strong correlations between the classifier accuracies and measures of length of class boundaries, thickness of the class manifolds, and nonlinearities of decision… 

Figures and Tables from this paper

How Complex Is Your Classification Problem?
TLDR
Detailed descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available that can be used to characterize the complexity of the respective classification problems.
Classifier Domains of Competence in Data Complexity Space
TLDR
The domain of competence of a set of popular classifiers is studied, by means of a methodology that relates the classifier’s behavior to problem complexity, to identify the features of a classification task that are most relevant in optimal classifier selection.
How Complex is your classification problem? A survey on measuring classification complexity
TLDR
This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems and implements a set of complexity measures that are implemented on an R package named Extended Complexity Library (ECoL).
Domain of competence of XCS classifier system in complexity measurement space
TLDR
This paper investigates the domain of competence of XCS by means of a methodology that characterizes the complexity of a classification problem by a set of geometrical descriptors, and focuses on XCS with hyperrectangle codification, which has been predominantly used for real-attributed domains.
A meta-learning framework for pattern classification by means of data complexity measures
TLDR
This paper presents a general meta-learning framework based on a number of data complexity measures and discusses the applicability of this method to several problems in pattern analysis.
A review of data complexity measures and their applicability to pattern classification problems
TLDR
A review of data complexity measures in the framework of pattern classification and possible applications to a number of practical problems is presented.
Measures of Geometrical Complexity in Classification Problems
TLDR
This work proposes to address the mystery of why popular classifiers fail to perform to perfect accuracy in a practical application by developing measures of geometrical and topological characteristics of point sets in high-dimensional spaces.
On classifier domains of competence
  • Ester Bernadó-Mansilla, T. Ho
  • Computer Science
    Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.
  • 2004
TLDR
It is observed that the simplest classifiers, nearest neighbor and linear classifier, have extreme behavior of being the best for the easiest and the most difficult problems respectively, while the sophisticated ensemble classifiers tend to be robust for wider types of problems and are largely equivalent in performance.
On classifier domains of competence
TLDR
It is observed that the simplest classifiers, nearest neighbor and linear classifier, have extreme behavior of being the best for the easiest and the most difficult problems respectively, while the sophisticated ensemble classifiers tend to be robust for wider types of problems and are largely equivalent in performance.
Selective Ensemble Based on Transformation of Classifiers Used SPCA
TLDR
A new ensemble method is proposed that selecting classifiers to ensemble via the transformation of individual classifiers based on diversity and accuracy obtains the better performance than other methods, and the kappa-error diagrams illustrate that the proposed method enhances the diversity compared against other methods.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
The Random Subspace Method for Constructing Decision Forests
  • T. Ho
  • Computer Science
    IEEE Trans. Pattern Anal. Mach. Intell.
  • 1998
TLDR
A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Complexity Measures of Supervised Classification Problems
  • T. Ho, M. Basu
  • Computer Science
    IEEE Trans. Pattern Anal. Mach. Intell.
  • 2002
TLDR
A set of real-world problems to random labelings of points is compared and it is found that real problems contain structures in this measurement space that are significantly different from the random sets.
Measuring the complexity of classification problems
  • T. Ho, M. Basu
  • Computer Science
    Proceedings 15th International Conference on Pattern Recognition. ICPR-2000
  • 2000
TLDR
A set of real world problems is compared to random combinations of points in this measurement space and it is found that real problems contain structures that are significantly different from the random sets.
Considerations of sample and feature size
TLDR
The design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class (N) and dimensionality (L) and is demonstrated to be an extremely biased estimate of either the Bayes or test- set error rate.
Decision Combination in Multiple Classifier Systems
TLDR
This work proposes three methods based on the highest rank, the Borda count, and logistic regression for class set reranking that have been tested in applications of degraded machine-printed characters and works from large lexicons, resulting in substantial improvement in overall correctness.
The learning behavior of single neuron classifiers on linearly separable or nonseparable input
  • M. Basu, T. Ho
  • Computer Science
    IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)
  • 1999
TLDR
This work explores the behavior of several classical descent procedures for determining linear separability and training linear classifiers in the presence of linearly non separated input and finds that the adaptive procedures have serious implementation problems which make them less preferable than linear programming.
A System for Induction of Oblique Decision Trees
TLDR
This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree.
An alternative method of stochastic discrimination with applications to pattern recognition
TLDR
This dissertation introduces an alternative method of performing stochastic discrimination in pattern recognition which differs in several aspects from the original method introduced by Kleinberg, and discusses four variations of the method, each of which uses different variations of Ho's discriminant functions.
Bounds on the number of samples needed for neural learning
TLDR
It is shown that Omega(min (d,n) M) boundary samples are required for successful classification of M clusters of samples using a two-hidden-layer neural network with d-dimensional inputs and n nodes in the first hidden layer.
...
1
2
3
4
...