Learning attention for historical text normalization by learning to pronounce

  title={Learning attention for historical text normalization by learning to pronounce},
  author={Marcel Bollmann and Joachim Bingel and Anders S{\o}gaard},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in… 

Figures and Tables from this paper

Few-Shot and Zero-Shot Learning for Historical Text Normalization

This paper evaluates 63 multi-task learning configurations for sequence-to-sequence-based historical text normalization across ten datasets from eight languages, using autoencoding, grapheme- to-phoneme mapping, and lemmatization as auxiliary tasks and shows that zero-shot learning outperforms the simple, but relatively strong, identity baseline.

Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning

This paper defines multilevel text normalization as sequence-to-sequence processing that transforms naturally noisy text into a sequence of normalized units of meaning (morphemes) in three steps and proposes a systematic solution for all of them using neural encoder-decoder technology.

Semi-supervised Contextual Historical Text Normalization

By utilizing a simple generative normalization model and obtaining powerful contextualization from the target-side language model, this work trains accurate models with unlabeled historical data at the same accuracy levels.

Training Data Augmentation for Low-Resource Morphological Inflection

It is found that autoencoding random strings works sur-prisingly well, outperformed only slightly by autoenCoding words from an unlabelled corpus and the random string method also works well in the 10,000-example setting despite not being tuned for it.

Multi-task learning for historical text normalization: Size matters

The main finding—contrary to what has been observed for other NLP tasks—is that multi-task learning mainly works when target task data is very scarce.

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

The results show that NMT models are much better than SMT models in terms of character error rate, and the vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization.

Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources

The results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning, despite a lexical coverage of 67% between the Italian Comedy data and the training data.

Automatic Normalisation of Historical Text

This thesis evaluates three models: a Hidden Markov Model, which has not been previously used for historical text normalisation; a soft attention Neural Network model,Which achieves state-of-the-art normalisation accuracy in all datasets, even when the volume of training data is restricted.

Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting

This work addresses paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms, and proposes two new methods for the minimal-resource setting.

Historical Text Normalization with Delayed Rewards

Policy gradient training leads to more accurate normalizations for long or unseen words, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, it is shown that policy gradient fine-tuning leads to significant improvements across the board.

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

This work explores the suitability of a deep neural network architecture for historical documents processing, particularly a deep bi-LSTM network applied on a character level, and shows that multi-task learning with additional normalization data can improve the model’s performance further.

Normalizing historical orthography for OCR historical documents using LSTM

This paper proposes a new technique to model the target modern language by means of a recurrent neural network with long-short term memory architecture and shows the proposed LSTM model outperforms on normalizing the modern wordform to historical wordform.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Multi-task Sequence to Sequence Learning

The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.

Automatic Normalization for Linguistic Annotation of Historical Language Data

Different methods for spelling normalization of historical texts with regard to further processing with modern part-of-speech taggers are presented and evaluated and a chain combination using word-based and character-based techniques is shown to be best for normalization.

Visualizing and Understanding Neural Models in NLP

Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.

Multi-Task Learning for Multiple Language Translation

The recently proposed neural machine translation model is extended to a multi-task learning framework which shares source language representation and separates the modeling of different target language translation.

An SMT Approach to Automatic Annotation of Historical Text

This paper proposes an approach to tagging and parsing of historical text, using characterbased SMT methods for translating the historical spelling to a modern spelling before applying the NLP tools, and shows that this approach to spelling normalisation is successful even with small amounts of training data and is generalisable to several languages.

Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation

It is shown that a character-level machine translation system trained on pairs of segments (not pairs of words) and including multiple language models is able to achieve up to 90.46% of word normalisation accuracy.