Corpus ID: 6090657

emLam - a Hungarian Language Modeling baseline

@article{Nemeskey2017emLamA,
  title={emLam - a Hungarian Language Modeling baseline},
  author={D. Nemeskey},
  journal={ArXiv},
  year={2017},
  volume={abs/1701.07880}
}
  • D. Nemeskey
  • Published 2017
  • Computer Science
  • ArXiv
  • This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced. 
    1 Citations

    Tables and Topics from this paper

    One format to rule them all - The emtsv pipeline for Hungarian
    • 2
    • PDF

    References

    SHOWING 1-10 OF 39 REFERENCES
    Compositional Morphology for Word Representations and Language Modelling
    • 206
    • PDF
    Large Language Models in Machine Translation
    • 492
    • PDF
    Building a Large Annotated Corpus of English: The Penn Treebank
    • 7,634
    • Highly Influential
    • PDF
    One billion word benchmark for measuring progress in statistical language modeling
    • 735
    • Highly Influential
    • PDF
    Class-Based n-gram Models of Natural Language
    • 3,081
    • PDF
    Comparison of part-of-speech and automatically derived category-based language models for speech recognition
    • 80
    • PDF
    The Hungarian Gigaword Corpus
    • 42
    • Highly Influential
    • PDF
    Large Scale Language Modeling in Automatic Speech Recognition
    • 51
    • PDF
    Strategies for training large scale neural network language models
    • 411
    • PDF