Deep contextualized word representations

@article{Peters2018DeepCW,
  title={Deep contextualized word representations},
  author={Matthew E. Peters and Mark Neumann and Mohit Iyyer and Matt Gardner and Christopher Clark and Kenton Lee and Luke Zettlemoyer},
  journal={ArXiv},
  year={2018},
  volume={abs/1802.05365}
}
  • Matthew E. Peters, Mark Neumann, +4 authors Luke Zettlemoyer
  • Published 2018
  • Computer Science
  • ArXiv
  • We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy. [...] Key Result We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.Expand Abstract
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 9,786
    • Open Access
    Improving Language Understanding by Generative Pre-Training
    • 1,400
    • Highly Influenced
    • Open Access
    Language Models are Unsupervised Multitask Learners
    • 1,724
    • Open Access
    XLNet: Generalized Autoregressive Pretraining for Language Understanding
    • 1,328
    • Open Access
    Contextual String Embeddings for Sequence Labeling
    • 375
    • Highly Influenced
    • Open Access
    GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
    • 838
    • Highly Influenced
    • Open Access
    Universal Language Model Fine-tuning for Text Classification
    • 1,098
    • Highly Influenced
    • Open Access
    Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
    • 623
    • Highly Influenced
    • Open Access
    HuggingFace's Transformers: State-of-the-art Natural Language Processing
    • 696
    • Highly Influenced
    • Open Access

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 65 REFERENCES
    Glove: Global Vectors for Word Representation
    • 14,475
    • Highly Influential
    • Open Access
    Distributed Representations of Words and Phrases and their Compositionality
    • 18,700
    • Open Access
    Long Short-Term Memory
    • 30,522
    • Open Access
    Adam: A Method for Stochastic Optimization
    • 49,174
    • Open Access
    Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
    • 3,790
    • Open Access
    Enriching Word Vectors with Subword Information
    • 3,603
    • Open Access
    Natural Language Processing (Almost) from Scratch
    • 5,429
    • Open Access
    Word Representations: A Simple and General Method for Semi-Supervised Learning
    • 1,946
    • Open Access
    Building a Large Annotated Corpus of English: The Penn Treebank
    • 7,381
    • Open Access
    Dropout: a simple way to prevent neural networks from overfitting
    • 18,825
    • Open Access