# Accuracy and Diversity in Ensembles of Text Categorisers

@article{Adeva2005AccuracyAD, title={Accuracy and Diversity in Ensembles of Text Categorisers}, author={Juan Jose Garc{\'i}a Adeva and Ulises Cervi{\~n}o Beresi and Rafael Alejandro Calvo}, journal={CLEI Electron. J.}, year={2005}, volume={8} }

Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Categorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition matrix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity… Expand

#### 36 Citations

Diversified Random Forests Using Random Subspaces

- Computer Science
- IDEAL
- 2014

This paper proposes a method to promote Random Forest diversity by using randomly selected subspaces, giving a weight to each subspace according to its predictive power, and using this weight in majority voting. Expand

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

- Computer Science
- SGAI Conf.
- 2015

Experimental results on 15 real datasets from the UCI repository prove the superiority of the proposed extension of RF termed CLUB-DRF, which is much smaller in size than RF, and yet performs at least as good asRF, and mostly exhibits higher performance in terms of accuracy. Expand

Amended Cross Entropy Cost: Framework For Explicit Diversity Encouragement

- Computer Science, Mathematics
- ArXiv
- 2020

The Amended Cross Entropy (ACE) is presented, affording the capability to train multiple classifiers while explicitly controlling the diversity between them, and works for classification problems analogously to Negative Correlation Learning (NCL) for regression problems. Expand

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

- Computer Science
- EANN
- 2016

Experimental results on 10 real datasets prove the superiority of the proposed method over the traditional RF, namely, Local Outlier Factor (LOF) and a known technique called ensemble pruning. Expand

On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications

- Mathematics, Computer Science
- ArXiv
- 2015

Experimental results on 15 real datasets from the UCI repository prove the superiority of the proposed extension of RF termed CLUB-DRF that is much smaller in size and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. Expand

Random forests: from early developments to recent advancements

- Engineering
- 2014

Ensemble classification is a data mining approach that utilizes a number of classifiers that work together in order to identify the class label for unlabeled instances. Random forest (RF) is an… Expand

An Outlier Detection-based Tree Selection Approach to Extreme Pruning of Random Forests

- Computer Science
- ArXiv
- 2015

Experimental results on 10 real datasets prove the superiority of the proposed extension of RF termed LOFB-DRF, which is much smaller in size than RF, and yet performs at least as good as RF, but mostly exhibits higher performance in terms of accuracy. Expand

An effective approach for improving the accuracy of a random forest classifier in the classification of Hyperion data

- Computer Science
- 2020

The performance of RF was observed to be significantly enhanced in terms of predictive ability and computational expenses with the optimized set of features and number of random trees as base classifiers. Expand

Clustering Based Ensemble Classification for Spam Filtering

- 2006

Spam filtering has become a very important issue throughout the last years as unsolicited bulk e-mail imposes large problems in terms of both the amount of time spent on and the resources needed to… Expand

Tackle three practical classification problems via Ensemble Learning

- Computer Science
- 2012 IEEE International Conference on Granular Computing
- 2012

The proposed Ensemble Learning method can improve the classification performance significantly in News Categorization, Intrusion Detection and Spam Detection. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

A decomposition scheme based on error-correcting output codes for ensembles of text categorizers

- Computer Science
- Third International Conference on Information Technology and Applications (ICITA'05)
- 2005

This work proposes a decomposition approach where both the categories and the classifiers are well separated in order to maximise the decision boundaries and minimise correlated predictions. Expand

Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

- Mathematics, Computer Science
- Machine Learning
- 2004

Although there are proven connections between diversity and accuracy in some special cases, the results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems. Expand

Effectiveness of error correcting
output coding methods in ensemble and monolithic learning
machines

- Computer Science
- Formal Pattern Analysis & Applications
- 2003

It is shown that the architecture of ECOC learning machines influences the accuracy of the ECOC classifier, highlighting that ensembles of parallel and independent dichotomic Multi-Layer Perceptrons are well-suited to implement ECOC methods. Expand

Diversity creation methods: a survey and categorisation

- Computer Science
- Inf. Fusion
- 2005

This paper reviews the varied attempts to provide a formal explanation of error diversity, including several heuristic and qualitative explanations in the literature, and introduces the idea of implicit and explicit diversity creation methods, and three dimensions along which these may be applied. Expand

ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION

- Computer Science
- 1999

Experimental results provide experimental results on several real-world datasets, extracted from the Internet, which demonstrate that ECOC can offer significant improvements in accuracy over conventional classification algorithms. Expand

Coding and decoding strategies for multi-class learning problems

- Computer Science
- Inf. Fusion
- 2003

The binary (0,1) code matrix conditions necessary for reduction of error in the ECOC framework are considered, and it is shown that equidistant codes can be generated by using properties related to the number of 1s in each row and between any pair of rows. Expand

An Experimental Analysis of the Dependence Among Codeword Bit Errors in Ecoc Learning Machines

- Computer Science
- Neurocomputing
- 2004

The results show that the dependence among computed codeword bits is signi4cantly smaller for ECOC PND, pointing out that ensembles of independent parallel dichotomizers are better suited for implementing ECOC classi4cation methods. Expand

A re-examination of text categorization methods

- Computer Science
- SIGIR '99
- 1999

The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances. Expand

An analysis of the relative hardness of Reuters-21578 subsets

- Computer Science
- J. Assoc. Inf. Sci. Technol.
- 2005

A systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers is presented, to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets. Expand

Experiments with a New Boosting Algorithm

- Computer Science
- ICML
- 1996

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand