Deep Contextualized Word Representations

@inproceedings{Peters2018DeepCW,
  title={Deep Contextualized Word Representations},
  author={Matthew E. Peters and Mark Neumann and Mohit Iyyer and Matt Gardner and Christopher Clark and Kenton Lee and Luke Zettlemoyer},
  booktitle={NAACL},
  year={2018}
}
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy. [...] Key Result We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.Expand
Deep contextualized word embeddings from character language models for neural sequence labeling
TLDR
This thesis evaluates the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets. Expand
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated. Expand
Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations
TLDR
Different strategies of integrating pre-trained contextualized word representations are explored and the best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets. Expand
Retrofitting Contextualized Word Embeddings with Paraphrases
TLDR
This work proposes a post-processing approach to retrofit the contextualized word embedding with paraphrases, which seeks to minimize the variance of word representations on paraphrased contexts and significantly improves ELMo on various sentence classification and inference tasks. Expand
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
This work quantifies the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding. Expand
Contextual String Embeddings for Sequence Labeling
TLDR
This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text. Expand
Contextualized Word Representations for Self-Attention Network
TLDR
It is demonstrated that a free RNN/CNN self-attention model used for sentiment analysis can be improved with 2.53% by using contextualized word representation learned in a language modeling task. Expand
Linguistic Knowledge and Transferability of Contextual Representations
TLDR
It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge. Expand
Context Analysis for Pre-trained Masked Language Models
TLDR
A detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models suggests significant differences on the contextual impact between the two model architectures. Expand
A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings
TLDR
A comprehensive comparison of state-of-theart multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging is performed and a new method for creating multilingual contextualized word embeddings is proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks. Expand
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers. Expand
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
TLDR
This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
Embeddings for Word Sense Disambiguation: An Evaluation Study
TLDR
This work proposes different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and performs a deep analysis of how different parameters affect performance. Expand
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
TLDR
A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging. Expand
Character-Aware Neural Language Models
TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling. Expand
Neural Sequence Learning Models for Word Sense Disambiguation
TLDR
This work proposes and studies in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models, and shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features. Expand
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network. Expand
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
...
1
2
3
4
5
...