emLam - a Hungarian Language Modeling baseline
@article{Nemeskey2017emLamA, title={emLam - a Hungarian Language Modeling baseline}, author={D. Nemeskey}, journal={ArXiv}, year={2017}, volume={abs/1701.07880} }
This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.
One Citation
References
SHOWING 1-10 OF 39 REFERENCES
Compositional Morphology for Word Representations and Language Modelling
- Computer Science
- ICML
- 2014
- 206
- PDF
Building a Large Annotated Corpus of English: The Penn Treebank
- Computer Science
- Comput. Linguistics
- 1993
- 7,634
- Highly Influential
- PDF
One billion word benchmark for measuring progress in statistical language modeling
- Computer Science
- INTERSPEECH
- 2014
- 735
- Highly Influential
- PDF
Comparison of part-of-speech and automatically derived category-based language models for speech recognition
- Computer Science
- Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
- 1998
- 80
- PDF
Strategies for training large scale neural network language models
- Computer Science
- 2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011
- 411
- PDF