Learning attention for historical text normalization by learning to pronounce

@inproceedings{Bollmann2017LearningAF,
  title={Learning attention for historical text normalization by learning to pronounce},
  author={Marcel Bollmann and Joachim Bingel and Anders S{\o}gaard},
  booktitle={ACL},
  year={2017}
}
Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in… 

Figures and Tables from this paper

Few-Shot and Zero-Shot Learning for Historical Text Normalization
TLDR
This paper evaluates 63 multi-task learning configurations for sequence-to-sequence-based historical text normalization across ten datasets from eight languages, using autoencoding, grapheme- to-phoneme mapping, and lemmatization as auxiliary tasks and shows that zero-shot learning outperforms the simple, but relatively strong, identity baseline.
Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning
TLDR
This paper defines multilevel text normalization as sequence-to-sequence processing that transforms naturally noisy text into a sequence of normalized units of meaning (morphemes) in three steps and proposes a systematic solution for all of them using neural encoder-decoder technology.
Semi-supervised Contextual Historical Text Normalization
TLDR
By utilizing a simple generative normalization model and obtaining powerful contextualization from the target-side language model, this work trains accurate models with unlabeled historical data at the same accuracy levels.
Multi-task learning for historical text normalization: Size matters
TLDR
The main finding—contrary to what has been observed for other NLP tasks—is that multi-task learning mainly works when target task data is very scarce.
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
TLDR
The results show that NMT models are much better than SMT models in terms of character error rate, and the vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization.
Training Data Augmentation for Low-Resource Morphological Inflection
TLDR
This work describes the UoE-LMU submission for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection, Subtask 1: given a lemma and target morphological tags, generate the target inflected form and finds that autoencoding random strings works surprisingly well and works well in the 10,000-example setting despite not being tuned for it.
Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources
TLDR
The results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning, despite a lexical coverage of 67% between the Italian Comedy data and the training data.
Automatic Normalisation of Historical Text
TLDR
This thesis evaluates three models: a Hidden Markov Model, which has not been previously used for historical text normalisation; a soft attention Neural Network model,Which achieves state-of-the-art normalisation accuracy in all datasets, even when the volume of training data is restricted.
Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting
TLDR
This work addresses paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms, and proposes two new methods for the minimal-resource setting.
Historical Text Normalization with Delayed Rewards
TLDR
Policy gradient training leads to more accurate normalizations for long or unseen words, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, it is shown that policy gradient fine-tuning leads to significant improvements across the board.
...
1
2
3
...

References

SHOWING 1-10 OF 30 REFERENCES
Improving historical spelling normalization with bi-directional LSTMs and multi-task learning
TLDR
This work explores the suitability of a deep neural network architecture for historical documents processing, particularly a deep bi-LSTM network applied on a character level, and shows that multi-task learning with additional normalization data can improve the model’s performance further.
Normalizing historical orthography for OCR historical documents using LSTM
TLDR
This paper proposes a new technique to model the target modern language by means of a recurrent neural network with long-short term memory architecture and shows the proposed LSTM model outperforms on normalizing the modern wordform to historical wordform.
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
Multi-task Sequence to Sequence Learning
TLDR
The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.
Automatic Normalization for Linguistic Annotation of Historical Language Data
TLDR
Different methods for spelling normalization of historical texts with regard to further processing with modern part-of-speech taggers are presented and evaluated and a chain combination using word-based and character-based techniques is shown to be best for normalization.
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.
Multi-Task Learning for Multiple Language Translation
TLDR
The recently proposed neural machine translation model is extended to a multi-task learning framework which shares source language representation and separates the modeling of different target language translation.
An SMT Approach to Automatic Annotation of Historical Text
TLDR
This paper proposes an approach to tagging and parsing of historical text, using characterbased SMT methods for translating the historical spelling to a modern spelling before applying the NLP tools, and shows that this approach to spelling normalisation is successful even with small amounts of training data and is generalisable to several languages.
Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation
TLDR
It is shown that a character-level machine translation system trained on pairs of segments (not pairs of words) and including multiple language models is able to achieve up to 90.46% of word normalisation accuracy.
...
1
2
3
...