Efficient softmax approximation for GPUs

@inproceedings{Grave2017EfficientSA,
  title={Efficient softmax approximation for GPUs},
  author={Edouard Grave and Armand Joulin and Moustapha Ciss{\'e} and David Grangier and Herv{\'e} J{\'e}gou},
  booktitle={ICML},
  year={2017}
}
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach… CONTINUE READING

5 Figures & Tables

Topics

Statistics

0204060201620172018
Citations per Year

Citation Velocity: 23

Averaging 23 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.