Deep Contextualized Word Representations
@inproceedings{Peters2018DeepCW, title={Deep Contextualized Word Representations}, author={Matthew E. Peters and Mark Neumann and Mohit Iyyer and Matt Gardner and Christopher Clark and Kenton Lee and Luke Zettlemoyer}, booktitle={NAACL}, year={2018} }
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy. [] Key Result We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.
7,987 Citations
Deep contextualized word embeddings from character language models for neural sequence labeling
- Computer Science
- 2019
This thesis evaluates the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets.
Dissecting Contextual Word Embeddings: Architecture and Representation
- Computer ScienceEMNLP
- 2018
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations
- Computer ScienceEMNLP
- 2019
Different strategies of integrating pre-trained contextualized word representations are explored and the best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets.
Retrofitting Contextualized Word Embeddings with Paraphrases
- Computer ScienceEMNLP
- 2019
This work proposes a post-processing approach to retrofit the contextualized word embedding with paraphrases, which seeks to minimize the variance of word representations on paraphrased contexts and significantly improves ELMo on various sentence classification and inference tasks.
Quantifying the Contextualization of Word Representations with Semantic Class Probing
- Computer ScienceFINDINGS
- 2020
This work quantifies the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding.
Contextual String Embeddings for Sequence Labeling
- Computer ScienceCOLING
- 2018
This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.
Contextualized Word Representations for Self-Attention Network
- Computer Science2018 13th International Conference on Computer Engineering and Systems (ICCES)
- 2018
It is demonstrated that a free RNN/CNN self-attention model used for sentiment analysis can be improved with 2.53% by using contextualized word representation learned in a language modeling task.
Linguistic Knowledge and Transferability of Contextual Representations
- Computer ScienceNAACL
- 2019
It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge.
Context Analysis for Pre-trained Masked Language Models
- Computer ScienceFINDINGS
- 2020
A detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models suggests significant differences on the contextual impact between the two model architectures.
Dynamic Contextualized Word Embeddings
- Computer Science, LinguisticsACL
- 2021
Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for a range of NLP tasks involving semantic variability.
References
SHOWING 1-10 OF 65 REFERENCES
Learned in Translation: Contextualized Word Vectors
- Computer ScienceNIPS
- 2017
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
Semi-supervised sequence tagging with bidirectional language models
- Computer ScienceACL
- 2017
A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
- Computer ScienceCoNLL
- 2016
This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks.
Word Representations: A Simple and General Method for Semi-Supervised Learning
- Computer ScienceACL
- 2010
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Embeddings for Word Sense Disambiguation: An Evaluation Study
- Computer ScienceACL
- 2016
This work proposes different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and performs a deep analysis of how different parameters affect performance.
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
- Computer ScienceEMNLP
- 2015
A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.
Character-Aware Neural Language Models
- Computer ScienceAAAI
- 2016
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
Neural Sequence Learning Models for Word Sense Disambiguation
- Computer ScienceEMNLP
- 2017
This work proposes and studies in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models, and shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Computer ScienceEMNLP
- 2013
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Enriching Word Vectors with Subword Information
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2017
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.