• Publications
  • Influence
UniMorph 2.0: Universal Morphology
TLDR
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. Expand
  • 63
  • 6
  • PDF
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
TLDR
We present a novel approach to counterfactual data augmentation that reduces gender stereotyping in neural language models by a factor of 2.5 without sacrificing grammaticality. Expand
  • 33
  • 5
  • PDF
Are All Languages Equally Hard to Language-Model?
TLDR
We show that in some languages, the textual expression of the information is harder to predict with both $n$-gram and LSTM language models. Expand
  • 41
  • 3
  • PDF
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
TLDR
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. Expand
  • 31
  • 3
  • PDF
The CoNLL-SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection
TLDR
The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Expand
  • 63
  • 2
  • PDF
A Structured Variational Autoencoder for Contextual Morphological Inflection
TLDR
We introduce a novel generative latent-variable model for the semi-supervised learning of morphological inflection generation. Expand
  • 7
  • 1
  • PDF
Incident-Driven Machine Translation and Name Tagging for Low-resource Languages
TLDR
We describe novel approaches to tackling the problem of natural language processing for low-resource languages. Expand
  • 7
  • 1
  • PDF
Are All Languages Equally Hard to Language-Model?
TLDR
We develop an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. Expand
  • 4
  • 1
  • PDF
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!
TLDR
We propose an alternative approach based on clustering readily available pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. Expand
  • 1
  • 1
  • PDF
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
TLDR
We show how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks. Expand
  • 17
  • PDF