Adapting SimpleNLG to German
- Marcel Bollmann
- LinguisticsEuropean Workshop on Natural Language Generation
- 28 September 2011
SimpleNLG for German, a surface realisation engine for German based on SimpleNLG (Gatt and Reiter, 2009), is described, with a special focus on word order phenomena.
(Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool
- Marcel Bollmann
- Computer Science
- 2012
This paper compares several approaches to normalization with a focus on methods based on string distance measures and evaluates them on two different types of historical texts, showing that a combination of normalization methods produces the best results.
Rule-Based Normalization of Historical Texts
- Marcel Bollmann, Florian Petran, Stefanie Dipper
- Computer Science
- 1 September 2011
An unsupervised, rulebased approach which maps historical wordforms to modern wordforms through context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.
Improving historical spelling normalization with bi-directional LSTMs and multi-task learning
- Marcel Bollmann, Anders Søgaard
- Computer ScienceInternational Conference on Computational…
- 1 October 2016
This work explores the suitability of a deep neural network architecture for historical documents processing, particularly a deep bi-LSTM network applied on a character level, and shows that multi-task learning with additional normalization data can improve the model’s performance further.
A Large-Scale Comparison of Historical Text Normalization Systems
- Marcel Bollmann
- Computer ScienceNorth American Chapter of the Association for…
- 3 April 2019
This paper presents the largest study of historical text normalization done so far, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods.
Learning attention for historical text normalization by learning to pronounce
- Marcel Bollmann, Joachim Bingel, Anders Søgaard
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2017
Interestingly, it is observed that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms, which is an important step toward understanding how MTL works.
CorA: A web-based annotation tool for historical and other non-standard language data
- Marcel Bollmann, Florian Petran, Stefanie Dipper, J. Krasselt
- Computer ScienceLaTeCH@EACL
- 1 April 2014
We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the…
POS Tagging for Historical Texts with Sparse Training Data
- Marcel Bollmann
- Computer ScienceLAW@ACL
- 1 August 2013
This paper presents a method for part-ofspeech tagging of historical data and evaluates it on texts from different corpora of historical German (15th–18th century). Spelling normalization is used to…
The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation
- E. T. K. Sang, Marcel Bollmann, Kalliopi Zervanou
- Computer Science
- 1 December 2017
The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools appl…
Manual and semi-automatic normalization of historical spelling - case studies from Early New High German
- Marcel Bollmann, Stefanie Dipper, J. Krasselt, Florian Petran
- Computer ScienceConference on Natural Language Processing
- 19 September 2012
Norma is presented, a semi-automatic normalization tool that integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way and dynamically updates the set of rule entries, given new input.
...
...