Tips, guidelines and tools for managing multi-label datasets: the mldr.datasets R package and the Cometa data repository

@article{Charte2018TipsGA,
  title={Tips, guidelines and tools for managing multi-label datasets: the mldr.datasets R package and the Cometa data repository},
  author={Francisco Charte and Antonio Jes{\'u}s Rivera and David Charte and Mar{\'i}a Jos{\'e} del Jes{\'u}s and Francisco Herrera},
  journal={Neurocomputing},
  year={2018},
  volume={289},
  pages={68-85}
}
Abstract New proposals in the field of multi-label learning algorithms have been growing in number steadily over the last few years. The experimentation associated with each of them always goes through the same phases: selection of datasets, partitioning, training, analysis of results and, finally, comparison with existing methods. This last step is often hampered since it involves using exactly the same datasets, partitioned in the same way and using the same validation strategy. In this paper… Expand
An empirical analysis of binary transformation strategies and base algorithms for multi-label learning
TLDR
This study covers a family of multi-label strategies using a diversified range of base algorithms, exploring their relationship over different perspectives and recommending strategies and base algorithms in accordance with different performance criteria. Expand
Active k-labelsets ensemble for multi-label classification
TLDR
An active k-labelsets ensemble (ACkEL) paradigm is proposed, borrowing the idea of active learning, where a label-selection criterion is proposed to evaluate the separability and balance level of the classes transformed from a label subset. Expand
A Comprehensive and Didactic Review on Multilabel Learning Software Tools
TLDR
This paper provides multilabel researchers with a comprehensive review of the currently available multILabel learning software, written following a didactic approach, focusing on how to accomplish each task rather than simply offering a list of programs and websites. Expand
Label Expansion for Multi-label Classification
TLDR
Preliminary experiments show the effectiveness of the proposed label expansion approach to improve the Binary Relevance strategy, which reduced the number of labels that were never predicted in the test instances. Expand
Ensemble of classifier chains and Credal C4.5 for solving multi-label classification
TLDR
An extensive experimental analysis with several multi-label datasets, different noise levels and a large number of evaluation metrics for MLC has shown that the ensemble of classifier chains (ECC) algorithm has better performance with CC4.5 as base classifier than using C 4.5. Expand
Using Credal C4.5 for Calibrated Label Ranking in Multi-Label Classification
TLDR
An exhaustive experimental analysis carried out in this research shows that Credal C 4.5 performs better than C4.5 when both algorithms are employed in CLR, being the improvement more notable as there is more noise in the labels. Expand
Aspect Based Multi-Labeling Using SVM Based Ensembler
TLDR
This work proposes a novel approach namely: Evolutionary Ensembler (EEn) to effectively boost the accuracy and diversity of multi-label learners and shows that EEn is vastly superior to other popular techniques. Expand
Non-parametric predictive inference for solving multi-label classification
TLDR
The experimental analysis shows that the proposed ML-DT based on the Nonparametric Predictive Predictive Inference Model on Multinomial data (NPI-M) obtains better results than the ML- DT that uses precise probabilities, especially when the authors work on data with noise. Expand
Multi-dimensional Bayesian network classifiers: A survey
TLDR
A comprehensive survey of this state-of-the-art classification model is offered by covering aspects related to their learning and inference process complexities, and the set of performance evaluation measures suitable for assessing multi-dimensional classifiers is reviewed. Expand
Articulating heterogeneous data streams with the attribute-relation file format
TLDR
The CincamimisConversor library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format, which could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. Expand
...
1
2
...

References

SHOWING 1-10 OF 82 REFERENCES
Working with Multilabel Datasets in R: The mldr Package
TLDR
The mldr package aims to provide the user with the functions needed to perform exploratory analysis of MLDs, determining their main traits both statistically and visually, and brings the proper tools to manipulate this kind of datasets, including the application of the most common transformation methods. Expand
A systematic review of multi-label feature selection and a new method based on label construction
TLDR
This work proposes an alternative method, LCFS, that constructs new labels based on relations between the original labels by constructing new labels from a multi-label dataset by augmented with second-order information before applying the standard approach. Expand
An extensive experimental comparison of methods for multi-label learning
TLDR
The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi- label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC). Expand
LI-MLC: A Label Inference Methodology for Addressing High Dimensionality in the Label Space for Multilabel Classification
TLDR
The purpose of this paper is to analyze dimensionality in the label space in MLDs, and to present a transformation methodology based on the use of association rules to discover label dependencies, resulting in a statistically significant improvement of performance in some cases. Expand
ML-KNN: A lazy learning approach to multi-label learning
TLDR
Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi- label learning algorithms. Expand
Addressing imbalance in multilabel classification: Measures and random resampling algorithms
TLDR
The purpose of this paper is to present specialized measures directed to assess the imbalance level in multilabel datasets (MLDs) and propose several algorithms designed to reduce the imbalance in MLDs in a classifier-independent way, by means of resampling techniques. Expand
On the Stratification of Multi-label Data
TLDR
This paper considers two stratification methods for multi- label data and empirically compares them along with random sampling on a number of datasets and reveals some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets. Expand
Tackling Multilabel Imbalance through Label Decoupling and Data Resampling Hybridization
TLDR
The goal of this work is to propose REMEDIAL-HwR (REMEDIAL Hybridization with Resampling), a procedure to hybridize this method with some of the best resampling algorithms available in the literature, including random oversampling, heuristic undersampling and synthetic sample generation techniques. Expand
Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets
TLDR
The problem of difficult labels is deeply analyzed, its influence in multilabel classifiers is studied, and a novel way to solve this problem is proposed, which aims to relax label concurrence. Expand
On the Impact of Dataset Complexity and Sampling Strategy in Multilabel Classifiers Performance
TLDR
MLC is an increasingly widespread data mining technique, and its goal is to categorize patterns in several non-exclusive groups, and it is applied in fields such as news categorization, image labeling and music classification. Expand
...
1
2
3
4
5
...