Deep contextualized word representations

@inproceedings{Peters2018DeepCW,
  title={Deep contextualized word representations},
  author={Matthew E. Peters and Mark Neumann and Mohit Iyyer and Matt Gardner and Christopher Clark and Kenton Lee and Luke Zettlemoyer},
  booktitle={NAACL-HLT},
  year={2018}
}
  • Matthew E. Peters, Mark Neumann, +4 authors Luke Zettlemoyer
  • Published in NAACL-HLT 2018
  • Computer Science
  • We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy. [...] Key Result We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.Expand Abstract
    5,046 Citations
    Deep contextualized word embeddings from character language models for neural sequence labeling
    • 1
    • Highly Influenced
    Dissecting Contextual Word Embeddings: Architecture and Representation
    • 175
    • PDF
    Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations
    • 16
    • Highly Influenced
    • PDF
    Retrofitting Contextualized Word Embeddings with Paraphrases
    • 7
    • Highly Influenced
    • PDF
    Quantifying the Contextualization of Word Representations with Semantic Class Probing
    • 1
    Quantifying the Contextualization of Word Representations with Semantic Class Probing
    • 2
    • Highly Influenced
    • PDF
    Contextual String Embeddings for Sequence Labeling
    • 518
    • Highly Influenced
    • PDF
    Contextualized Word Representations for Self-Attention Network
    Linguistic Knowledge and Transferability of Contextual Representations
    • 227
    • PDF
    Context Analysis for Pre-trained Masked Language Models
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 65 REFERENCES
    Learned in Translation: Contextualized Word Vectors
    • 545
    • Highly Influential
    • PDF
    Semi-supervised sequence tagging with bidirectional language models
    • 366
    • PDF
    context2vec: Learning Generic Context Embedding with Bidirectional LSTM
    • 283
    • Highly Influential
    • PDF
    Word Representations: A Simple and General Method for Semi-Supervised Learning
    • 1,999
    • PDF
    Embeddings for Word Sense Disambiguation: An Evaluation Study
    • 189
    • PDF
    Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
    • 514
    • PDF
    Character-Aware Neural Language Models
    • 1,248
    • PDF
    Neural Sequence Learning Models for Word Sense Disambiguation
    • 112
    • PDF
    Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
    • 4,110
    • PDF
    Enriching Word Vectors with Subword Information
    • 4,320
    • PDF