Anke Lüdeling

Learn More
Tools for linguistic annotation employ different data models and accompanying visu-alization metaphors, depending on the particular type of annotation envisaged. When a corpus is to be annotated on multiple layers , and the annotations are to be related to one another, the output formats of the annotation tools need to be unified. We describe an implemented(More)
Learner corpora consist of texts produced by non-native speakers. In addition to these texts, some learner corpora also contain error annotations, which can reveal common errors made by language learners, and provide training material for automatic error correction. We present a novel type of error-annotated learner corpus containing sequences of revised(More)
The articles in this issue make two complementary assertions: first, language and linguistic sources are a key element of human cultural heritage and, second, we need to integrate the ancient goals of philology with rapidly emerging methods from fields such as Corpus and Computational Linguistics. The first 15,000,000 volumes digitized by Google contained(More)
This article describes the results of a case study to apply Optical Character Recognition (OCR) to scanned images of books printed between 1487 and 1870 by training the OCR engine OCRopus (Breuel et al. 2013) on the RIDGES herbal text corpus (Odebrecht et al. submitted). The resulting machine-readable text has character accuracies (percentage of correctly(More)
Our article explores the possibilities of using deeply annotated, incrementally evolving comparable corpora for the study of language change, in this case for different stages from Old High German to New High German. Using the example of the evolution of German past tenses, we show how a variety of categories ranging from low to high complexity interact(More)