Deep Contextualized Word Representations

@inproceedings{Peters2018DeepCW,
  title={Deep Contextualized Word Representations},
  author={Matthew E. Peters and Mark Neumann and Mohit Iyyer and Matt Gardner and Christopher Clark and Kenton Lee and Luke Zettlemoyer},
  booktitle={NAACL},
  year={2018}
}
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy. [] Key Result We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

Figures and Tables from this paper

Deep contextualized word embeddings from character language models for neural sequence labeling
TLDR
This thesis evaluates the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets.
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations
TLDR
Different strategies of integrating pre-trained contextualized word representations are explored and the best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets.
Retrofitting Contextualized Word Embeddings with Paraphrases
TLDR
This work proposes a post-processing approach to retrofit the contextualized word embedding with paraphrases, which seeks to minimize the variance of word representations on paraphrased contexts and significantly improves ELMo on various sentence classification and inference tasks.
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
This work quantifies the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding.
Contextual String Embeddings for Sequence Labeling
TLDR
This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.
Contextualized Word Representations for Self-Attention Network
TLDR
It is demonstrated that a free RNN/CNN self-attention model used for sentiment analysis can be improved with 2.53% by using contextualized word representation learned in a language modeling task.
Linguistic Knowledge and Transferability of Contextual Representations
TLDR
It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge.
Context Analysis for Pre-trained Masked Language Models
TLDR
A detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models suggests significant differences on the contextual impact between the two model architectures.
Dynamic Contextualized Word Embeddings
TLDR
Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for a range of NLP tasks involving semantic variability.
...
...

References

SHOWING 1-10 OF 65 REFERENCES
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
TLDR
This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks.
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Embeddings for Word Sense Disambiguation: An Evaluation Study
TLDR
This work proposes different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and performs a deep analysis of how different parameters affect performance.
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
TLDR
A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.
Character-Aware Neural Language Models
TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
Neural Sequence Learning Models for Word Sense Disambiguation
TLDR
This work proposes and studies in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models, and shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
...
...