Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction

  title={Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction},
  author={Maria Ryskina and Eduard H. Hovy and Taylor Berg-Kirkpatrick and Matthew R. Gormley},
Traditionally, character-level transduction problems have been solved with finite-state models designed to encode structural and linguistic knowledge of the underlying process, whereas recent approaches rely on the power and flexibility of sequence-to-sequence models with attention. Focusing on the less explored unsupervised learning scenario, we compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance. We… 

Lexically Aware Semi-Supervised Learning for OCR Post-Correction

This paper introduces a lexically aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding.

Criteria for Useful Automatic Romanization in South Asian Languages

This paper presents a number of possible criteria for systems that transliterate South Asian languages from their native scripts into the Latin script, a process known as romanization. These criteria



Applying the Transformer to Character-level Transduction

It is uncovered that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and it is shown that with a large enough batch size, the transformer does indeed outperform recurrent models.

Exact Hard Monotonic Attention for Character-Level Transduction

This work develops a hard attention sequence-to-sequence model that enforces strict monotonicity and learns alignment jointly and achieves state-of-the-art performance on morphological inflection.

Hard Non-Monotonic Attention for Character-Level Transduction

An exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings is introduced, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1.

Neural Finite-State Transducers: Beyond Rational Relations

Neural finite state transducers are introduced, a family of string transduction models defining joint and conditional probability distributions over pairs of strings that compete favorably against seq2seq models while offering interpretable paths that correspond to hard monotonic alignments.

Weighting Finite-State Transductions With Neural Context

This work proposes to keep the traditional architecture, which uses a finite-state transducer to score all possible output strings, but to augment the scoring function with the help of recurrent networks, and defines a probability distribution over aligned output strings in the form of a weighted finite- state automaton.

Morphology Matters: A Multilingual Language Modeling Analysis

This work fills in missing typological data for several languages and considers corpus-based measures of morphological complexity in addition to expert-produced typological features, and finds that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data.

What Kind of Language Is Hard to Language-Model?

A new paired-sample multiplicative mixed-effects model is introduced to obtain language difficulty coefficients from at-least-pairwise parallel corpora and it is shown that “translationese” is not any easier to model than natively written language in a fair comparison.

The Curious Case of Neural Text Degeneration

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

A Probabilistic Formulation of Unsupervised Text Style Transfer

A deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques and demonstrates the effectiveness of the method on a wide range of unsuper supervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation.

Unsupervised Text Style Transfer using Language Models as Discriminators

This paper proposes a new technique that uses a target domain language model as the discriminator, providing richer and more stable token-level feedback during the learning process, and shows that this approach leads to improved performance on three tasks: word substitution decipherment, sentiment modification, and related language translation.