Corpus ID: 46392264

Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages

  title={Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages},
  author={Dirk Goldhahn and T. Eckart and U. Quasthoff},
  • Dirk Goldhahn, T. Eckart, U. Quasthoff
  • Published in LREC 2012
  • Computer Science
  • The Leipzig Corpora Collection offers free online access to 136 monolingual dictionaries enriched with statistical information. [...] Key Result The mainly language-independent framework for preprocessing, cleaning and creating the corpora and computing the necessary statistics will also be depicted.Expand Abstract
    215 Citations

    Figures, Tables, and Topics from this paper

    High Quality Word Lists as a Resource for Multiple Purposes
    • PDF
    Web Corpus Construction
    • 49
    • Highly Influenced
    • PDF
    Text Corpora and the Challenge of Newly Written Languages
    • 1
    • PDF
    A Manual for Web Corpus Crawling of Low Resource Languages
    • 1
    • Highly Influenced
    iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
    • 8
    • PDF
    Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German
    • 6
    • Highly Influenced
    • PDF
    Mapping languages: the Corpus of Global Language Use
    • 5
    • PDF


    Creating General-Purpose Corpora Using Automated Search Engine Queries
    • 180
    • PDF
    BootCaT: Bootstrapping Corpora and Terms from the Web
    • 369
    • PDF
    Web as Corpus
    • 132
    • PDF
    The Crúbadán Project: Corpus building for under-resourced languages
    • 138
    • PDF
    Calculating Communities by Link Analysis of URLs
    • 10
    Projekt Der Deutsche Wortschatz
    • 31
    Separierung mit FindLinks gecrawlter Texte nach Sprachen
    • Bachelor Thesis,
    • 2011
    Language Statistics-Based Quality Assurance for Large Corpora
    • Proceedings of Asia Pacific Corpus Linguistics Conference
    • 2012