Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Rule-Based Normalization of Historical Texts
An unsupervised, rulebased approach which maps historical wordforms to modern wordforms through context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.
CorA: A web-based annotation tool for historical and other non-standard language data
- Marcel Bollmann, Florian Petran, Stefanie Dipper, J. Krasselt
- Computer ScienceLaTeCH@EACL
- 1 April 2014
We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the…
The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation
The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools appl…
Applying Rule-Based Normalization to Different Types of Historical Texts - An Evaluation
An unsupervised, rule-based approach which maps historical wordforms to modern wordforms in the form of context-aware rewrite rules that apply to sequences of characters derived from two aligned versions of the Luther bible.
Manual and semi-automatic normalization of historical spelling - case studies from Early New High German
- Marcel Bollmann, Stefanie Dipper, J. Krasselt, Florian Petran
- Computer ScienceKONVENS
- 19 September 2012
Norma is presented, a semi-automatic normalization tool that integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way and dynamically updates the set of rule entries, given new input.
Studies for Segmentation of Historical Texts : Sentences or Chunks ?
- Florian Petran
- Computer Science
This work uses a machine learning approach to label tokens with their relative positions in text segments using Conditional Random Fields and finds that the task gets easier, the smaller grained the target segments are.
ReM: A reference corpus of Middle High German - corpus compilation, annotation, and access
- Florian Petran, Marcel Bollmann, Stefanie Dipper, Thomas Klein
- LinguisticsJ. Lang. Technol. Comput. Linguistics
The ReM project builds on several earlier annotation efforts to produce a reference corpus for Middle High German, which consists of around two million tokens and provides a mostly complete collection of written records from Early Middle high German as well as a selection of Middle HighGerman texts from 1200 to 1350.
Aligning the Un-Alignable - A Pilot Study Using a Noisy Corpus of Nonstandardized, Semi-parallel Texts
- Florian Petran
- Computer ScienceCICLing
- 11 March 2012
A robust, precision oriented alignment method that deals with a corpus of comparable texts without standardized spelling or sentence boundary marking is presented and is found to outperform the competing one by a great margin.
Geographical Visualization of Search Results in Historical Corpora
- Florian Petran
- Computer ScienceLT4DH@COLING
- 1 December 2016
ANNISVis is a webapp for comparative visualization of geographical distribution of linguistic data, as well as a sample deployment for a corpus of Middle High German texts, which allows the user to formulate multiple ad-hoc queries and visualizes them on a map.
Evaluating Inter-Annotator Agreement on Historical Spelling Normalization
A new method to measure inter-annotator agreement for the normalization task integrates common chancecorrected agreement measures, such as Fleiss's κ or Krippendorff's α, and the novelty of the proposed method lies in the way the annotated word forms are treated.