• Publications
  • Influence
Deep contextualized word representations
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Expand
  • 3,907
  • 837
AllenNLP: A Deep Semantic Natural Language Processing Platform
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. Expand
  • 434
  • 63
Relationships between Water Vapor Path and Precipitation over the Tropical Oceans
The relationship between water vapor path W and surface precipitation rate P over tropical oceanic regions is analyzed using 4 yr of gridded daily SSM/I satellite microwave radiometer data. A tightExpand
  • 413
  • 53
Dissecting Contextual Word Embeddings: Architecture and Representation
A detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. Expand
  • 137
  • 47
Linguistic Knowledge and Transferability of Contextual Representations
Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. Expand
  • 149
  • 39
Semi-supervised sequence tagging with bidirectional language models
We propose a general semi-supervised approach for adding pre-trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. Expand
  • 310
  • 34
Understanding the origin and analysis of sediment-charcoal records with a simulation model
Interpreting sediment-charcoal records is challenging because there is little information linking charcoal production from fires to charcoal accumulation in lakes. We present a numerical modelExpand
  • 246
  • 25
Knowledge Enhanced Contextual Word Representations
We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. Expand
  • 65
  • 14
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
We show that the relative performance of fine-tuning vs. extraction depends on the similarity of the pretraining and target tasks. Expand
  • 110
  • 11
Longformer: The Long-Document Transformer
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length, making it easy to process documents of thousands of tokens or longer. Expand
  • 35
  • 10