Corpus ID: 14136307

One billion word benchmark for measuring progress in statistical language modeling

@inproceedings{Chelba2014OneBW,
  title={One billion word benchmark for measuring progress in statistical language modeling},
  author={Ciprian Chelba and Tomas Mikolov and Michael Schuster and Qi Ge and Thorsten Brants and Phillipp Koehn and Tony Robinson},
  booktitle={INTERSPEECH},
  year={2014}
}
  • Ciprian Chelba, Tomas Mikolov, +4 authors Tony Robinson
  • Published in INTERSPEECH 2014
  • Computer Science
  • We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned… CONTINUE READING

    Tables and Topics from this paper.

    Explore key concepts

    Links to highly relevant papers for key concepts in this paper:

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 660 CITATIONS

    Improved Language Modeling by Decoding the Past

    VIEW 1 EXCERPT
    CITES METHODS
    HIGHLY INFLUENCED

    Exploring the Limits of Language Modeling

    VIEW 6 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Character-level language modeling with hierarchical recurrent neural networks

    • Kyuyeon Hwang, Wonyong Sung
    • Computer Science, Mathematics
    • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2017
    VIEW 10 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Efficient Transfer Learning for Neural Network Language Models

    VIEW 1 EXCERPT
    CITES BACKGROUND

    TRANSFORMER-XL: LANGUAGE MODELING

    • 2018
    VIEW 6 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2014
    2020

    CITATION STATISTICS

    • 97 Highly Influenced Citations

    • Averaged 153 Citations per year from 2018 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 51 REFERENCES