Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification

Abstract

The emergence of hubs in k-nearest neighbor (kNN) topologies of intrinsically high dimensional data has recently been shown to be quite detrimental to many standard machine learning tasks, including classification. Robust hubness-aware learning methods are required in order to overcome the impact of the highly uneven distribution of influence. In this paper, we have adapted the Hidden Naive Bayes (HNB) model to the problem of modeling neighbor occurrences and co-occurrences in high-dimensional data. Hidden nodes are used to aggregate all pairwise occurrence dependencies. The result is a novel kNN classification method tailored specifically for intrinsically high-dimensional data, the Augmented Naive Hubness Bayesian k nearest Neighbor (ANHBNN). Neighbor co-occurrence information forms an important part of the model and our analysis reveals some surprising results regarding the influence of hubness on the shape of the co-occurrence distribution in high-dimensional data. The proposed approach was tested in the context of object recognition from images in class imbalanced data and the results show that it offers clear benefits when compared to the other hubness-aware kNN baselines.

DOI: 10.1007/978-3-642-40991-2_41

Extracted Key Phrases

10 Figures and Tables

Cite this paper

@inproceedings{Tomasev2013HubCM, title={Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification}, author={Nenad Tomasev and Dunja Mladenic}, booktitle={ECML/PKDD}, year={2013} }