Can Transformer Language Models Predict Psychometric Properties?

@inproceedings{Laverghetta2021CanTL,
  title={Can Transformer Language Models Predict Psychometric Properties?},
  author={Antonio Laverghetta and Animesh Nighojkar and Jamshidbek Mirzakhalov and John Licato},
  booktitle={STARSEM},
  year={2021}
}
Transformer-based language models (LMs) continue to advance state-of-the-art performance on NLP benchmark tasks, including tasks designed to mimic human-inspired “commonsense” competencies. To better understand the degree to which LMs can be said to have certain linguistic reasoning skills, researchers are beginning to adapt the tools and concepts of the field of psychometrics. But to what extent can the benefits flow in the other direction? I.e., can LMs be of use in predicting what the… 

Figures and Tables from this paper

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding
TLDR
Curriculum is introduced as a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena and it is shown that this linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
Developmental Negation Processing in Transformer Language Models
Reasoning using negation is known to be difficult for transformer-based language models. While previous studies have used the tools of psycholinguistics to probe a transformer’s ability to reason
Language Models Can Generate Human-Like Self-Reports of Emotion
TLDR
This work generates responses to the PANAS questionnaire with four different variants of the recent GPT-3 model, using modern neural language models to generate synthetic self-report data, and evaluating the human-likeness of the results.
Cognitive Modeling of Semantic Fluency Using Transformers
TLDR
Preliminary evidence is reported suggesting that, despite obvious implementational differences in how people and TLMs learn and use language, TLMs can be used to identify individual differences in human fluency task behaviors better than existing computational models, and may offer insights into human memory retrieval strategies.

References

SHOWING 1-10 OF 90 REFERENCES
Predicting Human Psychometric Properties Using Computational Language Models
. Transformer-based language models (LMs) continue to achieve state-of-the-art performance on natural language processing (NLP) benchmarks, including tasks designed to mimic human-inspired
Modeling Age of Acquisition Norms Using Transformer Networks
TLDR
This paper explores using several transformer models to predict the age of acquisition norms for several datasets and obtains promising results overall, as the transformers can outperform the baselines in most cases.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior
TLDR
Testing over two dozen models on how well their next-word expectations predict human reading time behavior on naturalistic text corpora finds that across model architectures and training dataset sizes the relationship between word log-probability and reading time is (near-)linear.
A Systematic Assessment of Syntactic Generalization in Neural Language Models
TLDR
A systematic evaluation of the syntactic knowledge of neural language models, testing 20 combinations of model types and data sizes on a set of 34 English-language syntactic test suites finds substantial differences in syntactic generalization performance by model architecture.
A Deep Learning Architecture for Psychometric Natural Language Processing
TLDR
A deep learning architecture to extract psychometric dimensions from user-generated texts, PyNDA, which markedly outperforms traditional feature-based classifiers as well as the state-of-the-art deep learning architectures.
Artificial Neural Networks Accurately Predict Language Processing in the Brain
TLDR
The hypothesis that a drive to predict future inputs may shape human language processing, and perhaps the way knowledge of language is learned and organized in the brain, is supported.
Transformer Networks of Human Conceptual Knowledge
TLDR
A computational model capable of simulating aspects of human knowledge for thousands of real-world concepts is presented, fine-tuning a transformer network for natural language understanding on participant-generated feature norms and shows how the combination of natural language data and psychological data can be used to build cognitive models with rich world knowledge.
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
TLDR
This paper re-evaluates a claim due to Goodkind and Bicknell (2018) that a language model’s ability to model reading times is a linear function of its perplexity and introduces an alternate measure of language modeling performance called predictability norm correlation based on Cloze probabilities measured from human subjects.
Language Models as Knowledge Bases?
TLDR
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
...
...