Dissecting Contextual Word Embeddings: Architecture and Representation

@article{Peters2018DissectingCW,
  title={Dissecting Contextual Word Embeddings: Architecture and Representation},
  author={Matthew E. Peters and Mark Neumann and Luke Zettlemoyer and Wen-tau Yih},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.08949}
}
  • Matthew E. Peters, Mark Neumann, +1 author Wen-tau Yih
  • Published 2018
  • Computer Science
  • ArXiv
  • Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. [...] Key Result Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.Expand Abstract
    162 Citations
    Quantifying the Contextualization of Word Representations with Semantic Class Probing
    • 1
    • Highly Influenced
    • PDF
    Linguistic Knowledge and Transferability of Contextual Representations
    • 203
    • PDF
    Analysing Word Representation from the Input and Output Embeddings in Neural Network Language Models
    Deep Contextualized Word Embeddings for Universal Dependency Parsing
    LEARN FROM CONTEXT ? P ROBING FOR SENTENCE STRUCTURE IN CONTEXTUALIZED WORD REPRESENTATIONS
    On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)
    What do you learn from context? Probing for sentence structure in contextualized word representations
    • 234
    • Highly Influenced
    • PDF
    Alternative Weighting Schemes for ELMo Embeddings
    • 3
    • PDF
    Efficient Contextual Representation Learning With Continuous Outputs
    • 1
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 56 REFERENCES
    Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis
    • 35
    • PDF
    Deep contextualized word representations
    • 4,601
    • PDF
    Deep RNNs Encode Soft Hierarchical Syntax
    • 61
    • PDF
    Semi-supervised sequence tagging with bidirectional language models
    • 345
    • PDF
    Learned in Translation: Contextualized Word Vectors
    • 520
    • PDF
    Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
    • 109
    • PDF
    What do Neural Machine Translation Models Learn about Morphology?
    • 190
    • Highly Influential
    • PDF
    Improving Language Understanding by Generative Pre-Training
    • 1,531
    • PDF
    Character-Aware Neural Language Models
    • 1,210
    • PDF
    Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
    • 402
    • PDF