One billion word benchmark for measuring progress in statistical language modeling

@inproceedings{Chelba2013OneBW,
  title={One billion word benchmark for measuring progress in statistical language modeling},
  author={Ciprian Chelba and Tomas Mikolov and Michael Schuster and Qi Ge and Thorsten Brants and Phillipp Koehn and Tony Robinson},
  booktitle={INTERSPEECH},
  year={2013}
}
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 450 CITATIONS

Faster Neural Network Training with Data Echoing

VIEW 7 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

PREVENTING POSTERIOR COLLAPSE WITH δ-VAES

VIEW 6 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension

  • COLING
  • 2018
VIEW 6 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

TRANSFORMER-XL: LANGUAGE MODELING

  • 2018
VIEW 6 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Deep Learning Scaling is Predictable, Empirically

  • ArXiv
  • 2017
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

End-to-End Online Speech Recognition with Recurrent Neural Networks

VIEW 8 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Global context-dependent recurrent neural network language model with sparse feature learning

  • Neural Computing and Applications
  • 2017
VIEW 6 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2014
2019

CITATION STATISTICS

  • 92 Highly Influenced Citations

  • Averaged 116 Citations per year from 2017 through 2019

  • 7% Increase in citations per year in 2019 over 2018