Dissecting Contextual Word Embeddings: Architecture and Representation

@inproceedings{Peters2018DissectingCW,
  title={Dissecting Contextual Word Embeddings: Architecture and Representation},
  author={Matthew E. Peters and Mark Neumann and Luke Zettlemoyer and Wen-tau Yih},
  booktitle={EMNLP},
  year={2018}
}
Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. [...] Key Result Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.Expand
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond
TLDR
A more principled approach to leverage information from all layers of NLMs is introduced, informed by a probing analysis on 14 NLM variants, and the versatility of these sense embeddings in contrast to task-specific models is emphasized. Expand
Linguistic Knowledge and Transferability of Contextual Representations
TLDR
It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge. Expand
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
This work quantifies the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding. Expand
Analysing Word Representation from the Input and Output Embeddings in Neural Network Language Models
TLDR
Comparisons are made between the input and output embeddings and other SOTA distributional models to gain a better understanding of the types of information they represent and create locally-optimal approximations for the intermediate representations from the language model. Expand
Deep Contextualized Word Embeddings for Universal Dependency Parsing
TLDR
Based on ELMo’s advantage on OOV, experiments that simulate low-resource settings are conducted and the results show that deep contextualized word embeddings are effective for data-insufficient tasks where the OOV problem is severe. Expand
LEARN FROM CONTEXT ? P ROBING FOR SENTENCE STRUCTURE IN CONTEXTUALIZED WORD REPRESENTATIONS
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks.Expand
On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)
TLDR
It is shown that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree. Expand
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena. Expand
On the Linguistic Representational Power of Neural Machine Translation Models
TLDR
It is shown that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Expand
TRACKING THE PROGRESS OF LANGUAGE MODELS BY EXTRACTING THEIR UNDERLYING KNOWLEDGE GRAPHS
  • 2020
The state of the art of language models, previously dominated by pre-trained word embeddings, is now being pushed forward by large pre-trained contextual representations. This success has drivenExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Deep RNNs Encode Soft Hierarchical Syntax
TLDR
A set of experiments is presented to demonstrate that deep recurrent neural networks learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision, indicating that a soft syntactic hierarchy emerges. Expand
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers. Expand
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks. Expand
Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis
TLDR
This work compares four objectives—language modeling, translation, skip-thought, and autoencoding—on their ability to induce syntactic and part-of-speech information, holding constant the quantity and genre of the training data, as well as the LSTM architecture. Expand
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
TLDR
This paper investigates the role of context in an LSTM LM, through ablation studies, and analyzes the increase in perplexity when prior context words are shuffled, replaced, or dropped to provide a better understanding of how neural LMs use their context. Expand
What do Neural Machine Translation Models Learn about Morphology?
TLDR
This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
Character-Aware Neural Language Models
TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling. Expand
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. Expand
...
1
2
3
4
5
...