SetConv: A New Approach for Learning from Imbalanced Data

@article{Gao2020SetConvAN,
  title={SetConv: A New Approach for Learning from Imbalanced Data},
  author={Y. Gao and Yifan Li and Yu Lin and Charu C. Aggarwal and L. Khan},
  journal={ArXiv},
  year={2020},
  volume={abs/2104.06313}
}
For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 34 REFERENCES
Deep MLPs for Imbalanced Classification
TLDR
This work shall explore the use of large, fully connected and potentially deep MLPs in such problems, and considers simple MLPs, with ReLU activations, softmax outputs and categorical cross-entropy loss, showing that these relatively straightforward MLP models yield state of the art results. Expand
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
TLDR
A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference. Expand
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
TLDR
This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversamplings, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Expand
Learning from Imbalanced Data Using Ensemble Methods and Cluster-Based Undersampling
TLDR
Experimental results demonstrate that the proposed ClusFirstClass algorithm yields promising results compared to the state-of-the art classification approaches, when evaluated against a number of highly imbalanced datasets. Expand
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
TLDR
This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampled technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes. Expand
Multiset Feature Learning for Highly Imbalanced Data Classification
TLDR
Experiments indicate that the proposed approaches outperform state-of-the-art highly imbalanced learning methods and are more robust to high IR. Expand
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches
TLDR
This experimental study will include several well-known algorithms from the literature such as decision trees, support vector machines and instance-based learning, with the intention of obtaining global conclusions from different classification paradigms. Expand
Oversampling Method for Imbalanced Classification
TLDR
This work proposes a new oversampling method SNOCC that can compensate the defects of SMOTE and employs a novel algorithm to find the nearest neighbors of samples, which is different to the previous ones, which makes the new samples created by SNOcc naturally reproduce the distribution of original seed samples. Expand
MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
TLDR
A new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems and is better than or comparable with some other existing methods in terms of various assessment metrics. Expand
Learning From Imbalanced Data
TLDR
This chapter aims to highlight the existence of imbalance in all real world data and the need to focus on the inherent characteristics present in imbalanced data that can degrade the performance of classifiers. Expand
...
1
2
3
4
...