• Publications
  • Influence
The TIGER Treebank
TLDR
The TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences, is reported on and what kind of information is encoded in the treebank is described and the different representation formats are introduced.
TIGER: Linguistic Interpretation of a German Corpus
TLDR
The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.
XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation
TLDR
An XML-based, generic stand-off architecture for multi-level linguistic annotations is proposed and an example instantiation of this architecture is presented and application scenarios that profit from this architecture are sketched out.
A Flexible Framework for Integrating Annotations from Different Tools and Tag Sets
TLDR
OLiA is introduced, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena and is tied to a machine learning component for semiautomatic annotation.
POS-Tagging of Historical Language Data: First Experiments
TLDR
As expected, tagging with “normalized”, quasi-standardized tokens performs best (accuracy > 91%).
Implementing and documenting large scale grammars: German LFG
TLDR
Zusammenfassung (Abstract in German) is presented, which aims to define and quantify the role of emotion in the decision-making process and the importance of individual acts of kindness towards one another.
Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
TLDR
The special form of GT as line image/transcription pairs makes it directly usable to train state-of-the-art recognition models for OCR software employing recurring neural networks in LSTM architecture such as Tesseract 4 or OCRopus.
Rule-Based Normalization of Historical Texts
TLDR
An unsupervised, rulebased approach which maps historical wordforms to modern wordforms through context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.
Annotation of Information Structure: an Evaluation across different Types of Texts
TLDR
This work focused on German texts of different types, both written texts and transcriptions of spoken language, and analyzed the annotation quantitatively and qualitatively, and reported on the evaluation of information structural annotation according to the LISA.
...
1
2
3
4
5
...