Accuracy and Diversity in Ensembles of Text Categorisers

@article{Adeva2005AccuracyAD,
  title={Accuracy and Diversity in Ensembles of Text Categorisers},
  author={Juan Jose Garc{\'i}a Adeva and Ulises Cervi{\~n}o Beresi and Rafael Alejandro Calvo},
  journal={CLEI Electron. J.},
  year={2005},
  volume={8}
}
Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Categorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition matrix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity… Expand
Diversified Random Forests Using Random Subspaces
TLDR
This paper proposes a method to promote Random Forest diversity by using randomly selected subspaces, giving a weight to each subspace according to its predictive power, and using this weight in majority voting. Expand
CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests
TLDR
Experimental results on 15 real datasets from the UCI repository prove the superiority of the proposed extension of RF termed CLUB-DRF, which is much smaller in size than RF, and yet performs at least as good asRF, and mostly exhibits higher performance in terms of accuracy. Expand
Amended Cross Entropy Cost: Framework For Explicit Diversity Encouragement
TLDR
The Amended Cross Entropy (ACE) is presented, affording the capability to train multiple classifiers while explicitly controlling the diversity between them, and works for classification problems analogously to Negative Correlation Learning (NCL) for regression problems. Expand
An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests
TLDR
Experimental results on 10 real datasets prove the superiority of the proposed method over the traditional RF, namely, Local Outlier Factor (LOF) and a known technique called ensemble pruning. Expand
On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications
TLDR
Experimental results on 15 real datasets from the UCI repository prove the superiority of the proposed extension of RF termed CLUB-DRF that is much smaller in size and yet performs at least as good as RF, and mostly exhibits higher performance in terms of accuracy. Expand
Random forests: from early developments to recent advancements
Ensemble classification is a data mining approach that utilizes a number of classifiers that work together in order to identify the class label for unlabeled instances. Random forest (RF) is anExpand
An Outlier Detection-based Tree Selection Approach to Extreme Pruning of Random Forests
TLDR
Experimental results on 10 real datasets prove the superiority of the proposed extension of RF termed LOFB-DRF, which is much smaller in size than RF, and yet performs at least as good as RF, but mostly exhibits higher performance in terms of accuracy. Expand
An effective approach for improving the accuracy of a random forest classifier in the classification of Hyperion data
TLDR
The performance of RF was observed to be significantly enhanced in terms of predictive ability and computational expenses with the optimized set of features and number of random trees as base classifiers. Expand
Clustering Based Ensemble Classification for Spam Filtering
Spam filtering has become a very important issue throughout the last years as unsolicited bulk e-mail imposes large problems in terms of both the amount of time spent on and the resources needed toExpand
Tackle three practical classification problems via Ensemble Learning
  • Xuzhou Li
  • Computer Science
  • 2012 IEEE International Conference on Granular Computing
  • 2012
TLDR
The proposed Ensemble Learning method can improve the classification performance significantly in News Categorization, Intrusion Detection and Spam Detection. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 22 REFERENCES
A decomposition scheme based on error-correcting output codes for ensembles of text categorizers
  • J. Adeva, R. Calvo
  • Computer Science
  • Third International Conference on Information Technology and Applications (ICITA'05)
  • 2005
TLDR
This work proposes a decomposition approach where both the categories and the classifiers are well separated in order to maximise the decision boundaries and minimise correlated predictions. Expand
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy
TLDR
Although there are proven connections between diversity and accuracy in some special cases, the results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems. Expand
Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines
TLDR
It is shown that the architecture of ECOC learning machines influences the accuracy of the ECOC classifier, highlighting that ensembles of parallel and independent dichotomic Multi-Layer Perceptrons are well-suited to implement ECOC methods. Expand
Diversity creation methods: a survey and categorisation
TLDR
This paper reviews the varied attempts to provide a formal explanation of error diversity, including several heuristic and qualitative explanations in the literature, and introduces the idea of implicit and explicit diversity creation methods, and three dimensions along which these may be applied. Expand
ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION
TLDR
Experimental results provide experimental results on several real-world datasets, extracted from the Internet, which demonstrate that ECOC can offer significant improvements in accuracy over conventional classification algorithms. Expand
Coding and decoding strategies for multi-class learning problems
TLDR
The binary (0,1) code matrix conditions necessary for reduction of error in the ECOC framework are considered, and it is shown that equidistant codes can be generated by using properties related to the number of 1s in each row and between any pair of rows. Expand
An Experimental Analysis of the Dependence Among Codeword Bit Errors in Ecoc Learning Machines
TLDR
The results show that the dependence among computed codeword bits is signi4cantly smaller for ECOC PND, pointing out that ensembles of independent parallel dichotomizers are better suited for implementing ECOC classi4cation methods. Expand
A re-examination of text categorization methods
TLDR
The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances. Expand
An analysis of the relative hardness of Reuters-21578 subsets
TLDR
A systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers is presented, to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets. Expand
Experiments with a New Boosting Algorithm
TLDR
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
...
1
2
3
...