• Corpus ID: 237385428

Under-bagging Nearest Neighbors for Imbalanced Classification

@article{Hang2021UnderbaggingNN,
  title={Under-bagging Nearest Neighbors for Imbalanced Classification},
  author={Hanyuan Hang and Yuchao Cai and Hanfang Yang and Zhouchen Lin},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.00531}
}
In this paper, we propose an ensemble learning algorithm called under-bagging k-nearest neighbors (under-bagging k-NN ) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors k, the expected sub-sample size s, and the bagging rounds B, optimal convergence rates for under-bagging k-NN can be achieved under mild assumptions w.r.t. the arithmetic mean (AM) of… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 99 REFERENCES
KRNN: k Rare-class Nearest Neighbour classification
TLDR
An algorithm k Rare-class Nearest Neighbour, or KRNN is proposed, by directly adjusting the induction bias of KNN to form dynamic query neighbourhoods, and to further adjust the positive posterior probability estimation to bias classification towards the rare class.
Properties of bagged nearest neighbour classifiers
It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual
A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets
TLDR
A density-based adaptive k nearest neighbor method, namely DBANN, which can handle imbalanced and overlapping problems simultaneously and significantly outperforms the state-of-the-art methods is proposed.
KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data
TLDR
Experimental results indicate that the proposed under-sampling method can effectively improve the five representative algorithms in terms of three popular metrics; area under the curve (AUC), G-mean and F-measure.
Compressed kNN: K-Nearest Neighbors with Data Compression
TLDR
This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data, which allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required.
Class Based Weighted K-Nearest Neighbor over Imbalance Dataset
TLDR
A modified version of kNN algorithm is proposed so that it takes into account the class distribution in a wider region around the query instance, and outperforms current state-of-the-art approaches.
An Effective Evidence Theory Based K-Nearest Neighbor (KNN) Classification
TLDR
This paper studies various K nearest neighbor (KNN) algorithms and presents a new KNN algorithm based on evidence theory that outperforms other KNN algorithms, including basic evidence based KNN.
On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance
TLDR
This paper studies consistency with respect to one performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establishes that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirical balanced form of risk minimization, are in fact consistent withrespect to the AM.
Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification
TLDR
This work proposes to identify exemplar minority class training instances and generalize them to Gaussian balls as concepts for the minority class to improve the performance of kNN and also outperforms popular re-sampling and costsensitive learning strategies for imbalanced classification.
On the Rate of Convergence of the Bagged Nearest Neighbor Estimate
TLDR
Bagging is a simple way to combine estimates in order to improve their performance, and it is shown that this estimate may achieve optimal rate of convergence, independently from the fact that resampling is done with or without replacement.
...
1
2
3
4
5
...