• Publications
  • Influence
Normalisation of Historical Text Using Context-Sensitive Weighted Levenshtein Distance and Compound Splitting
This paper presents a Levenshtein-based approach to normalisation of historical text to a modern spelling, and shows that this method is successful both in terms of normalisation accuracy, and by the performance of a standard modern tagger applied to the historical text.
Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction
A large number of historical texts are not available in an electronic format, and even if they are, they are unlikely to be suitable for use in an e-book format.
An SMT Approach to Automatic Annotation of Historical Text
This paper proposes an approach to tagging and parsing of historical text, using characterbased SMT methods for translating the historical spelling to a modern spelling before applying the NLP tools, and shows that this approach to spelling normalisation is successful even with small amounts of training data and is generalisable to several languages.
A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text
The evaluation of approaches for spelling normalisation of historical text based on data from five languages shows that the machine translation approach often gives the best results, but also that all approaches improve over the baseline and that no single method works best for all languages.
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
The results show that NMT models are much better than SMT models in terms of character error rate, and the vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization.
A Swedish Grammar for Word Prediction
A Swedish grammar for the FASTY word predictor is defined and implemented, which functions as a grammar checking filter, reranking the suggestions proposed by a statistic n-gram model on the basis of both confirming and rejecting rules.
The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation
The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools appl
The DECODE Database Collection of Historical Ciphers and Keys
We present an on-line database DECODE consisting of encrypted historical manuscripts, aiming at the systematic collection of ciphers and keys to create infrastructural support for historical resear
Rule-based normalisation of historical text - A diachronic study
The impact of a set of hand-crafted normalisation rules on Swedish texts ranging from 1527 to 1812 is explored, showing that spelling correction is a useful strategy for applying contemporary NLP tools to historical text.
The machine translation system MATS : past, present a future
This is a status report of the rule-based machine translation system MATS (Sagvall Hein et al., 2002), which has recently been focused on extending the linguistic resources to new domains and improving robustness.