Synthetic Oversampling of Multi-Label Data based on Local Label Distribution

  title={Synthetic Oversampling of Multi-Label Data based on Local Label Distribution},
  author={Bin Liu and Grigorios Tsoumakas},
Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multilabel sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare subconcepts and overlapping of classes that could be analysed by looking at the local… 

Multi-Label Sampling based on Local Label Imbalance

Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-label Data

This paper proposes a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling, and shows that the proposed method performed very well compared to the other competing algorithms.

Towards Class-Imbalance Aware Multi-Label Learning

A simple yet effective class-imbalance aware learning strategy called cross-coupling aggregation (COCOA) is proposed in this article, which works by leveraging the exploitation of label correlations as well as the exploration of class-IMbalance simultaneously.

Joint Learning of Binary Classifiers and Pairwise Label Correlations for Multi-label Image Classification

  • Junbin XiaoSheng Tang
  • Computer Science
    2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
  • 2020
This paper jointly learning the binary classifiers and pairwise label correlations (JBP) in an end-to-end manner and introduces the strategy of online hard sample mining to focus on distinguishing confusing label pairs.

EvoSplit: An evolutionary approach to split a multi-label data set into disjoint subsets

A single-objective evolutionary approach is introduced that tries to obtain a split that maximizes the similarity between those distributions independently and a new multi-objectives evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions.

Learning Fairly With Class-Imbalanced Data for Interference Coordination

A training method to encourage fairness among classes by minimizing the maximal cost of decisions among classes is proposed, which is converted into a problem to optimize the weighting factors on the training cost of each class.

Simultaneous and Spatiotemporal Detection of Different Levels of Activity in Multidimensional Data

A new multilabeling technique is introduced, which assigns different labels to different regions of interest in the data, and thus, incorporates the spatial aspect, and its ability in detecting frequent motion patterns based on predicted spatiotemporal activity levels is discussed.

Local Imbalance based Ensemble for Predicting Interactions between Novel Drugs and Targets

The proposed ensemble approaches consist of several DTI prediction models learned on training subsets which have been defined by different sampling strategies and indicate that the local imbalance-aware sampling strategy is the most effective.



Towards Label Imbalance in Multi-label Classification with Many Labels

This work is the first to tackle the imbalance problem in multi-label classification with many labels by proposing a novel Representation-based Multi-label Learning with Sampling (RMLS) approach.

Making Classifier Chains Resilient to Class Imbalance

Two extensions of ECC's basic approach are presented, where a varying number of binary models per label are built and chains of different sizes are constructed in order to improve the exploitation of majority examples with approximately the same computational budget.

Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets

Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests

This work introduces an extension of structured forests, a type of random forest used for structured prediction, called Sparse Oblique Structured Hellinger Forests (SOSHF), and proposes a new imbalance-aware formulation by altering how the splitting functions are learned in two ways.

On the Stratification of Multi-label Data

This paper considers two stratification methods for multi- label data and empirically compares them along with random sampling on a number of datasets and reveals some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.

A First Approach to Deal with Imbalance in Multi-label Datasets

The process of learning from imbalanced datasets has been deeply studied for binary and multi-class classification, but the proposals on how to measure and deal with imbalanced dataset in multi-label classification are scarce.