Efficient softmax approximation for GPUs

  title={Efficient softmax approximation for GPUs},
  author={Edouard Grave and Armand Joulin and Moustapha Ciss{\'e} and David Grangier and Herv{\'e} J{\'e}gou},
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach… CONTINUE READING

5 Figures & Tables



Citations per Year

Citation Velocity: 23

Averaging 23 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.