Implicit Representations of Meaning in Neural Language Models

  title={Implicit Representations of Meaning in Neural Language Models},
  author={Belinda Z. Li and Maxwell Nye and Jacob Andreas},
Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a… Expand
Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
This work seeks a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning and shows that this approach can increase the coherence and accuracy of neurally-based generations. Expand
Program Synthesis with Large Language Models
This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with betweenExpand


Deep Contextualized Word Representations
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
Infusing Finetuning with Semantic Dependencies
This approach applies novel probes to recent language models and finds that, unlike syntax, semantics is not brought to the surface by today’s pretrained models, and uses convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding tasks in the GLUE benchmark. Expand
A Structural Probe for Finding Syntax in Word Representations
A structural probe is proposed, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space, and shows that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax Trees are embedded implicitly in deep models’ vector geometry. Expand
Linguistic Regularities in Continuous Space Word Representations
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
It is argued that a system trained only on form has a priori no way to learn meaning, and a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding. Expand
Bringing Machine Learning and Compositional Semantics Together
This review presents a simple discriminative learning framework for defining statistical models and relating them to logical theories, and considers models that use distributed representations rather than logical ones, showing that these can be considered part of the same overall framework for understanding meaning and structural complexity. Expand
Analysis Methods in Neural Language Processing: A Survey
Analysis methods in neural language processing are reviewed, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work. Expand
Simpler Context-Dependent Logical Forms via Model Projections
This work considers the task of learning a context-dependent mapping from utterances to denotations, and performs successive projections of the full model onto simpler models that operate over equivalence classes of logical forms. Expand
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
A simple but effective approach to WSD using a nearest neighbor classification on CWEs and it is shown that the pre-trained BERT model is able to place polysemic words into distinct 'sense' regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability. Expand
Language Models as Knowledge Bases?
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge. Expand