• Publications
  • Influence
Adapting SimpleNLG to German
SimpleNLG for German, a surface realisation engine for German based on SimpleNLG (Gatt and Reiter, 2009), is described, with a special focus on word order phenomena.
(Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool
This paper compares several approaches to normalization with a focus on methods based on string distance measures and evaluates them on two different types of historical texts, showing that a combination of normalization methods produces the best results.
Rule-Based Normalization of Historical Texts
An unsupervised, rulebased approach which maps historical wordforms to modern wordforms through context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.
CorA: A web-based annotation tool for historical and other non-standard language data
We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the
Improving historical spelling normalization with bi-directional LSTMs and multi-task learning
This work explores the suitability of a deep neural network architecture for historical documents processing, particularly a deep bi-LSTM network applied on a character level, and shows that multi-task learning with additional normalization data can improve the model’s performance further.
Learning attention for historical text normalization by learning to pronounce
Interestingly, it is observed that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms, which is an important step toward understanding how MTL works.
The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation
The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools appl
A Large-Scale Comparison of Historical Text Normalization Systems
This paper presents the largest study of historical text normalization done so far, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods.
POS Tagging for Historical Texts with Sparse Training Data
This paper presents a method for part-ofspeech tagging of historical data and evaluates it on texts from different corpora of historical German (15th–18th century). Spelling normalization is used to
Applying Rule-Based Normalization to Different Types of Historical Texts - An Evaluation
An unsupervised, rule-based approach which maps historical wordforms to modern wordforms in the form of context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.