Exploring BERT’s sensitivity to lexical cues using tests from semantic priming

  title={Exploring BERT’s sensitivity to lexical cues using tests from semantic priming},
  author={Kanishka Misra and Allyson Ettinger and Julia Taylor Rayz},
Models trained to estimate word probabilities in context have become ubiquitous in natural language processing. How do these models use lexical cues in context to inform their word probabilities? To answer this question, we present a case study analyzing the pre-trained BERT model with tests informed by semantic priming. Using English lexical stimuli that show priming in humans, we find that BERT too shows “priming”, predicting a word with greater probability when the context includes a related… 

Figures and Tables from this paper

Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020
Unsupervised Few-Bits Semantic Hashing with Implicit Topics Modeling and Grid Tagging Scheme for End-to-End Fine-grained Opinion Extraction.
John praised Mary because _he_? Implicit Causality Bias and Its Interaction with Explicit Cues in LMs
This work investigates whether pre-trained language models encode IC bias and use it at inference time, and finds that to be the case, albeit to different degrees, for three distinct PLM architectures.
Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?
Transformer language models are proposed and provided evidence for one possible explanation—their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.
Does BERT Know that the IS-A Relation Is Transitive?
This investigation reveals that BERT’s predictions do not fully obey the transitivity property of the IS-A relation, and aims to quantify how much BERT agrees with the transitive property ofIS-A relations, via a minimalist probing setting.
Logically Consistent Adversarial Attacks for Soft Theorem Provers
This work proposes a novel, generative adversarial framework, LAVA, to select, apply and verify adversarial attacks on STPs, and demonstrates that the vanilla version outperforms standard methods, and can be further improved via a simple best-of- k decoding enhancement.
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
The minicons library is described and applied to two motivating case studies: One focusing on the learning dynamics of the BERT architecture on relative grammatical judgments, and the other on benchmarking 23 different LMs on zero-shot abductive reasoning.
Neural reality of argument structure constructions
In lexicalist linguistic theories, argument structure is assumed to be predictable from the meaning of verbs. As a result, the verb is the primary determinant of the meaning of a clause. In contrast,
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
This paper systematically describes the typology of artefacts, retrieval mechanisms and the way these artefacts are fused into the model to uncover combinations of design decisions that had not yet been tried in NLP systems.
Meaning in brains and machines: Internal activation update in large-scale language model partially reflects the N400 brain potential
Modelling of the N400 to naturalistic sentences using a large-scale, state-ofthe-art deep learning language model (GPT-2) suggests that activation updates in the model correspond to several N400 effects, but cannot account for all of them.
Sorting through the noise: Testing robustness of information processing in pre-trained language models
This paper examines robustness of models’ ability to deploy relevant context information in the face of distracting content, and presents models with cloze tasks requiring use of critical context information, and introduces distracting content to test how robustly the models retain and use that critical information for prediction.


Semantic Priming: Perspectives from Memory and Word Recognition
Part 1: Introduction. What Is Semantic Priming and Why Should Anyone Care? Part 2: Models. Spreading Activation Models. Becker's Verification Model. Compound-cue Models. Distributed Network Models.
The semantic priming project
These data represent the largest behavioral database on semantic priming and are available to researchers to aid in selecting stimuli, testing theories, and reducing potential confounds in their studies.
What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models
A suite of diagnostics drawn from human language experiments are introduced, which allow us to ask targeted questions about information used by language models for generating predictions in context, and the popular BERT model is applied.
Evaluation of word embeddings against cognitive processes: primed reaction times in lexical decision and naming tasks
Results show that GloVe embeddings lead to significantly higher correlation with experimental measurements than other controlled and off-the-shelfembeddings, and that the choice of a training corpus is less important than that of the algorithm.
Modeling garden path effects without explicit hierarchical syntax
Both classes of models correctly predicted increased difficulty in ambiguous sentences compared to controls, suggesting that the syntactic representations induced by RNNs are sufficient for this purpose and suggesting that it may not be possible to reduce garden path effects to predictability.
The effect of word predictability on reading time is logarithmic
The influence of contextual constraints on recall for words within sentences.
A hybrid model including aspects of the featural restriction model of sentence constraint and the relational-distinctive processing view is proposed to account for the influence of sentence constraints on memory of target words.
A Probabilistic Earley Parser as a Psycholinguistic Model
Under grammatical assumptions supported by corpus-frequency data, the operation of Stolcke's probabilistic Earley parser correctly predicts processing phenomena associated with garden path structural ambiguity and with the subject/object relative asymmetry.
Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models
This work uses a gradient similarity metric to demonstrate that LSTM LMs' representations of different types of sentences with relative clauses are organized hierarchically in a linguistically interpretable manner, suggesting that the LMs track abstract properties of the sentence.
Expectation-based syntactic comprehension