Nearest Neighbor Median Shift Clustering for Binary Data

@inproceedings{Beck2021NearestNM,
  title={Nearest Neighbor Median Shift Clustering for Binary Data},
  author={Ga{\"e}l Beck and Tarn Duong and Mustapha Lebbah and Hanene Azzag},
  booktitle={ICANN},
  year={2021}
}
We describe in this paper the theory and practice behind a new modal clustering method for binary data. Our approach (BinNNMS) is based on the nearest neighbor median shift. The median shift is an extension of the well-known mean shift, which was designed for continuous data, to handle binary data. We demonstrate that BinNNMS can discover accurately the location of clusters in binary data with theoretical and experimental analyses. 

References

SHOWING 1-10 OF 16 REFERENCES
A Unified View on Clustering Binary Data
  • Tao Li
  • Computer Science
    Machine Learning
  • 2005
TLDR
A unified view of binary data clustering is presented by examining the connections among various clustering criteria and experimental studies are conducted to empirically verify the relationships.
Topological map for binary data
TLDR
The eficiency of the proposed method when applied to high-dimensinal binary data is shown, which takes into account possible asymmetries of binary data.
Nearest neighbour estimators of density derivatives, with application to mean shift clustering
CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES
TLDR
A k-prototypes algorithm which is based on the k-means paradigm but removes the numeric data limitation whilst preserving its efficiency, and uses decision tree induction algorithms to create rules for clusters.
Optimization of k nearest neighbor density estimates
TLDR
Nonparametric density estimation using the k -nearest-neighbor approach is discussed and a functional form for the optimum k in terms of the sample size, the dimensionality of the observation space, and the underlying probability distribution is obtained.
Competitive Learning for Binary Valued Data
We propose a new approach for using online competitive learning on binary data. The usual Euclidean distance is replaced by binary distance measures, which take possible asymmetries of binary data
The estimation of the gradient of a density function, with applications in pattern recognition
TLDR
Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.
Some methods for classification and analysis of multivariate observations
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give
Multivariate binary discrimination by the kernel method
SUMMARY An extension of the kernel method of density estimation from continuous to multivariate binary spaces is described. Its simple nonparametric nature together with its consistency properties
On comparing partitions
Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,
...
...