Will it Unblend?

  title={Will it Unblend?},
  author={Yuval Pinter and Cassandra L. Jacobs and Jacob Eisenstein},
Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as “innoventor”, are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual… Expand

Figures and Tables from this paper

Superbizarre Is Not Superb: Improving BERT's Interpretations of Complex Words with Derivational Morphology
It is shown that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words. Expand
Integrating Approaches to Word Representation
The problem of representing the atomic elements of language in modern neural learning systems is one of the central challenges of the field of natural language processing. I present a survey of theExpand
Dynamic Contextualized Word Embeddings
Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work onExpand
NYTWIT: A Dataset of Novel Words in the New York Times
A collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding) is presented. Expand


What to do about non-standard (or non-canonical) language in NLP
The notion of canonicity is reviewed, and how it shapes the authors' community's approach to language and will also enable adaptive language technology capable of addressing natural language variation. Expand
How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?
The robustness of NLP against perturbed word forms is investigated by considering different noise distributions (one type of noise, combination of noise types) and mismatched noise distributions for training and testing. Expand
Joint Semantic Synthesis and Morphological Analysis of the Derived Word
A novel probabilistic model of word formation is proposed that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Neural Machine Translation of Rare Words with Subword Units
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
What Does BERT Learn about the Structure of Language?
This work provides novel support for the possibility that BERT networks capture structural information about language by performing a series of experiments to unpack the elements of English language structure learned by BERT. Expand
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Automatically Identifying the Source Words of Lexical Blends in English
In this first study of novel blends, a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends achieves an accuracy of 40% and preliminary results showing that its features for source word identification can be used to distinguish blends from other kinds of novel words. Expand