Stolen Probability: A Structural Weakness of Neural Language Models

  title={Stolen Probability: A Structural Weakness of Neural Language Models},
  author={David Demeter and Gregory J. Kimmel and Doug Downey},
Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space. The dot-product distance metric forms part of the inductive bias of NNLMs. Although NNLMs optimize well with this inductive bias, we show that this results in a sub-optimal ordering of the embedding space that structurally impoverishes some words at the… 

