Corpus ID: 51933878

Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection

  title={Semblance: A Rank-Based Kernel on Probability Spaces for Niche Detection},
  author={Divyansh Agarwal and Nancy Zhang},
  • Divyansh Agarwal, Nancy Zhang
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
  • In data science, determining proximity between observations is critical to many downstream analyses such as clustering, information retrieval and classification. However, when the underlying structure of the data probability space is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel concept of proximity, Semblance, that uses the empirical distribution across all observations to inform the similarity between each pair. The… CONTINUE READING

    Figures and Topics from this paper