• Computer Science
  • Published in ArXiv 2016

Exploring the Limits of Language Modeling

@article{Jzefowicz2016ExploringTL,
  title={Exploring the Limits of Language Modeling},
  author={Rafal J{\'o}zefowicz and Oriol Vinyals and Mike Schuster and Noam Shazeer and Yonghui Wu},
  journal={ArXiv},
  year={2016},
  volume={abs/1602.02410}
}
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 500 CITATIONS

A Simple Language Model based on PMI Matrix Approximations

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Character and Subword-Based Word Representation for Neural Language Modeling Prediction

VIEW 8 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

Device Placement Optimization with Reinforcement Learning

VIEW 18 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Ju n 20 17 Device Placement Optimization with Reinforcement Learning

VIEW 18 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Using Deep Neural Networks to Learn Syntactic

VIEW 4 EXCERPTS
CITES BACKGROUND & RESULTS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 79 Highly Influenced Citations

  • Averaged 149 Citations per year from 2017 through 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 49 REFERENCES

On the difficulty of training recurrent neural networks

Pascanu, Razvan, +3 authors Yoshua
  • arXiv preprint arXiv:1211.5063,
  • 2012
VIEW 1 EXCERPT
HIGHLY INFLUENTIAL

Sparse Non-negative Matrix Language Modeling

  • Transactions of the Association for Computational Linguistics
  • 2016
VIEW 2 EXCERPTS