Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection
@article{Agarwal2018SemblanceAR, title={Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection}, author={Divyansh Agarwal and Nancy Zhang}, journal={ArXiv}, year={2018}, volume={abs/1808.02061} }
In data science, determining proximity between observations is critical to many downstream analyses such as clustering, information retrieval and classification. However, when the underlying structure of the data probability space is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel concept of proximity, Semblance, that uses the empirical distribution across all observations to inform the similarity between each pair. The… CONTINUE READING