• Publications
  • Influence
UniMorph 3.0: Universal Morphology
TLDR
Advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018 are detailed. Expand
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
TLDR
This work presents a novel approach for converting between masculine-inflected and feminine-inflection sentences in morphologically rich languages and shows that it reduces gender stereotyping by a factor of 2.5 without any sacrifice to grammaticality. Expand
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextualExpand
Are All Languages Equally Hard to Language-Model?
TLDR
This work develops an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. Expand
The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection
TLDR
The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages and featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task. Expand
What Kind of Language Is Hard to Language-Model?
TLDR
A new paired-sample multiplicative mixed-effects model is introduced to obtain language difficulty coefficients from at-least-pairwise parallel corpora and it is shown that “translationese” is not any easier to model than natively written language in a fair comparison. Expand
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
TLDR
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages, and provides baseline results on several tasks made possible by the dataset, including single word transliterations, full sentence transliteration, and language modeling of native script and romanized text. Expand
UniMorph 2.0: Universal Morphology
TLDR
Advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018 are detailed. Expand
Are All Languages Equally Hard to Language-Model?
TLDR
This work develops an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. Expand
A Structured Variational Autoencoder for Contextual Morphological Inflection
TLDR
This work introduces a novel generative latent-variable model for the semi-supervised learning of inflection generation, and derives an efficient variational inference procedure based on the wake-sleep algorithm to enable posterior inference over the latent variables. Expand
...
1
2
3
...