Corpus ID: 6483732

Efficient softmax approximation for GPUs

@article{Grave2017EfficientSA,
  title={Efficient softmax approximation for GPUs},
  author={Edouard Grave and Armand Joulin and Moustapha Ciss{\'e} and David Grangier and H. J{\'e}gou},
  journal={ArXiv},
  year={2017},
  volume={abs/1609.04309}
}
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computational cost by exploiting the specificities of modern architectures and matrix-matrix vector operations… Expand
Deep Learning Language Modeling Workloads: Where Time Goes on Graphics Processors
Unbiased scalable softmax optimization
Unbounded cache model for online language modeling with open vocabulary
Mixtape: Breaking the Softmax Bottleneck Efficiently
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
Strategies for Training Large Vocabulary Neural Language Models
Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation
Exploring the Limits of Language Modeling
Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model
Efficient Estimation of Word Representations in Vector Space
A fast and simple algorithm for training neural probabilistic language models
A Scalable Hierarchical Distributed Language Model
Strategies for training large scale neural network language models
...
1
2
3
4
5
...