Pointer Sentinel Mixture Models


Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural… (More)


11 Figures and Tables


Citations per Year

89 Citations

Semantic Scholar estimates that this publication has 89 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Merity2016PointerSM, title={Pointer Sentinel Mixture Models}, author={Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher}, journal={CoRR}, year={2016}, volume={abs/1609.07843} }