• Corpus ID: 237385428

Under-bagging Nearest Neighbors for Imbalanced Classification

@article{Hang2021UnderbaggingNN,
  title={Under-bagging Nearest Neighbors for Imbalanced Classification},
  author={Hanyuan Hang and Yuchao Cai and Hanfang Yang and Zhouchen Lin},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.00531}
}
In this paper, we propose an ensemble learning algorithm called under-bagging k -nearest neighbors ( under-bagging k -NN ) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors k , the expected sub-sample size s , and the bagging rounds B , optimal convergence rates for under-bagging k -NN can be achieved under mild assumptions w.r.t. the arithmetic mean… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 126 REFERENCES
Properties of bagged nearest neighbour classifiers
Summary.  It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of
A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets
TLDR
A density-based adaptive k nearest neighbor method, namely DBANN, which can handle imbalanced and overlapping problems simultaneously and significantly outperforms the state-of-the-art methods is proposed.
KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data
TLDR
Experimental results indicate that the proposed under-sampling method can effectively improve the five representative algorithms in terms of three popular metrics; area under the curve (AUC), G-mean and F-measure.
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
TLDR
A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
Compressed kNN: K-Nearest Neighbors with Data Compression
TLDR
This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data, which allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required.
Class Based Weighted K-Nearest Neighbor over Imbalance Dataset
TLDR
A modified version of kNN algorithm is proposed so that it takes into account the class distribution in a wider region around the query instance, and outperforms current state-of-the-art approaches.
An Effective Evidence Theory Based K-Nearest Neighbor (KNN) Classification
TLDR
This paper studies various K nearest neighbor (KNN) algorithms and presents a new KNN algorithm based on evidence theory that outperforms other KNN algorithms, including basic evidence based KNN.
On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance
TLDR
This paper studies consistency with respect to one performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establishes that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirical balanced form of risk minimization, are in fact consistent withrespect to the AM.
...
1
2
3
4
5
...