Exploring BERT’s sensitivity to lexical cues using tests from semantic priming

@article{Misra2020ExploringBS,
  title={Exploring BERT’s sensitivity to lexical cues using tests from semantic priming},
  author={Kanishka Misra and Allyson Ettinger and Julia Taylor Rayz},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.03010}
}
Models trained to estimate word probabilities in context have become ubiquitous in natural language processing. How do these models use lexical cues in context to inform their word probabilities? To answer this question, we present a case study analyzing the pre-trained BERT model with tests informed by semantic priming. Using English lexical stimuli that show priming in humans, we find that BERT too shows “priming”, predicting a word with greater probability when the context includes a related… 

Figures and Tables from this paper

Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020
TLDR
Unsupervised Few-Bits Semantic Hashing with Implicit Topics Modeling and Grid Tagging Scheme for End-to-End Fine-grained Opinion Extraction.
John praised Mary because _he_? Implicit Causality Bias and Its Interaction with Explicit Cues in LMs
TLDR
This work investigates whether pre-trained language models encode IC bias and use it at inference time, and finds that to be the case, albeit to different degrees, for three distinct PLM architectures.
Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?
TLDR
Transformer language models are proposed and provided evidence for one possible explanation—their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
TLDR
This paper systematically describes the typology of artefacts, retrieval mechanisms and the way these artefacts are fused into the model to uncover combinations of design decisions that had not yet been tried in NLP systems.
Does BERT Know that the IS-A Relation Is Transitive?
TLDR
This investigation reveals that BERT’s predictions do not fully obey the transitivity property of the IS-A relation, and aims to quantify how much BERT agrees with the transitive property ofIS-A relations, via a minimalist probing setting.
Logically Consistent Adversarial Attacks for Soft Theorem Provers
TLDR
This work proposes a novel, generative adversarial framework, LAVA, to select, apply and verify adversarial attacks on STPs, and demonstrates that the vanilla version outperforms standard methods, and can be further improved via a simple best-of- k decoding enhancement.
Neural reality of argument structure constructions
In lexicalist linguistic theories, argument structure is assumed to be predictable from the meaning of verbs. As a result, the verb is the primary determinant of the meaning of a clause. In contrast,
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
TLDR
The minicons library is described and applied to two motivating case studies: One focusing on the learning dynamics of the BERT architecture on relative grammatical judgments, and the other on benchmarking 23 different LMs on zero-shot abductive reasoning.
Do language models learn typicality judgments from text?
TLDR
It is suggested that text-based exposure alone is insufficient to acquire typicality knowledge, and two tests for LMs are proposed, showing modest—but not completely absent—correspondence between LMs and humans.
Exploring Multi-hop Reasoning Process in NLU from the View of Bayesian Probability
TLDR
This work focuses on multihop reasoning processes of PTLMs and performs an analysis on a logical reasoning dataset, Soft Reasoner, which shows that if a model is more in line with the Bayesian process, it tends to have a better generalization ability.
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
Semantic Priming: Perspectives from Memory and Word Recognition
Part 1: Introduction. What Is Semantic Priming and Why Should Anyone Care? Part 2: Models. Spreading Activation Models. Becker's Verification Model. Compound-cue Models. Distributed Network Models.
The semantic priming project
TLDR
These data represent the largest behavioral database on semantic priming and are available to researchers to aid in selecting stimuli, testing theories, and reducing potential confounds in their studies.
Evaluation of word embeddings against cognitive processes: primed reaction times in lexical decision and naming tasks
TLDR
Results show that GloVe embeddings lead to significantly higher correlation with experimental measurements than other controlled and off-the-shelfembeddings, and that the choice of a training corpus is less important than that of the algorithm.
Modeling garden path effects without explicit hierarchical syntax
TLDR
Both classes of models correctly predicted increased difficulty in ambiguous sentences compared to controls, suggesting that the syntactic representations induced by RNNs are sufficient for this purpose and suggesting that it may not be possible to reduce garden path effects to predictability.
The effect of word predictability on reading time is logarithmic
The influence of contextual constraints on recall for words within sentences.
TLDR
A hybrid model including aspects of the featural restriction model of sentence constraint and the relational-distinctive processing view is proposed to account for the influence of sentence constraints on memory of target words.
A Probabilistic Earley Parser as a Psycholinguistic Model
TLDR
Under grammatical assumptions supported by corpus-frequency data, the operation of Stolcke's probabilistic Earley parser correctly predicts processing phenomena associated with garden path structural ambiguity and with the subject/object relative asymmetry.
Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models
TLDR
This work uses a gradient similarity metric to demonstrate that LSTM LMs' representations of different types of sentences with relative clauses are organized hierarchically in a linguistically interpretable manner, suggesting that the LMs track abstract properties of the sentence.
Expectation-based syntactic comprehension
Chapter 2 Contextual Constraint and Lexical Processing
...
1
2
3
4
...