Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function


In this paper, we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency… (More)
DOI: 10.18653/v1/P17-2026

1 Figure or Table


