# Nearest Neighbor Median Shift Clustering for Binary Data

@inproceedings{Beck2021NearestNM, title={Nearest Neighbor Median Shift Clustering for Binary Data}, author={Ga{\"e}l Beck and Tarn Duong and Mustapha Lebbah and Hanene Azzag}, booktitle={ICANN}, year={2021} }

We describe in this paper the theory and practice behind a new modal clustering method for binary data. Our approach (BinNNMS) is based on the nearest neighbor median shift. The median shift is an extension of the well-known mean shift, which was designed for continuous data, to handle binary data. We demonstrate that BinNNMS can discover accurately the location of clusters in binary data with theoretical and experimental analyses.

## References

SHOWING 1-10 OF 16 REFERENCES

A Unified View on Clustering Binary Data

- Computer ScienceMachine Learning
- 2005

A unified view of binary data clustering is presented by examining the connections among various clustering criteria and experimental studies are conducted to empirically verify the relationships.

Topological map for binary data

- Computer ScienceESANN
- 2000

The eficiency of the proposed method when applied to high-dimensinal binary data is shown, which takes into account possible asymmetries of binary data.

Nearest neighbour estimators of density derivatives, with application to mean shift clustering

- Computer SciencePattern Recognit. Lett.
- 2016

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES

- Computer Science
- 1997

A k-prototypes algorithm which is based on the k-means paradigm but removes the numeric data limitation whilst preserving its efficiency, and uses decision tree induction algorithms to create rules for clusters.

Optimization of k nearest neighbor density estimates

- Computer ScienceIEEE Trans. Inf. Theory
- 1973

Nonparametric density estimation using the k -nearest-neighbor approach is discussed and a functional form for the optimum k in terms of the sample size, the dimensionality of the observation space, and the underlying probability distribution is obtained.

Competitive Learning for Binary Valued Data

- Computer Science
- 1998

We propose a new approach for using online competitive learning on binary data. The usual Euclidean distance is replaced by binary distance measures, which take possible asymmetries of binary data…

The estimation of the gradient of a density function, with applications in pattern recognition

- Computer ScienceIEEE Trans. Inf. Theory
- 1975

Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.

Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give…

Multivariate binary discrimination by the kernel method

- Computer Science
- 1976

SUMMARY An extension of the kernel method of density estimation from continuous to multivariate binary spaces is described. Its simple nonparametric nature together with its consistency properties…

On comparing partitions

- Mathematics
- 2015

Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,…