• Corpus ID: 213407696

Ensemble learning of high dimension datasets

  title={Ensemble learning of high dimension datasets},
  author={Jin Sean Lim},
Ensemble learning, an approach in Machine Learning, makes decisions based on the collective decision of a committee of learners to solve complex tasks with minimal human intervention. Advances in computing technology have enabled researchers build datasets with the number of features in the order of thousands and enabled building more accurate predictive models. Unfortunately, high dimensional datasets are especially challenging for machine learning due to the phenomenon dubbed as the “curse of… 


Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble
Simulation results performed on data sets with up to 20,000 features indicate that the random subsample ensemble classifier performs comparably to other benchmark machine learners based on performance measures of prediction accuracy and cpu time.
Learning in high dimensions with projected linear discriminants
The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in
Structural diversity for decision tree ensemble learning
The TMD (tree matching diversity) measure for decision trees is proposed and empirically evaluated performances of selective ensemble approaches with decision forests by incorporating different diversity measures to validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed.
A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification
This paper proposes an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features, and shows that this method outperforms other methods.
Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions
It is shown that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned inThe data space is provably poor.
Nearest neighbor ensemble
  • C. Domeniconi, B. Yan
  • Computer Science
    Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.
  • 2004
This work exploits the instability of NN classifiers with respect to different choices of features to generate an effective and diverse set of NNs with possibly uncorrelated errors and takes advantage of the high dimensionality of the data.
Weighted random subspace method for high dimensional data classification.
The proposed weight assignment procedures are applied to the random subspace method and found that significant improvement over the common equal weight assignment scheme may be achieved by the method.
Classifiers selection for ensemble learning based on accuracy and diversity
Classification by ensembles from random partitions of high-dimensional data
Structure-aware error bounds for linear classification with the zero-one loss
We prove risk bounds for binary classification in high-dimensional settings when the sample size is allowed to be smaller than the dimensionality of the training set observations. In particular, we