# oLMpics-On What Language Model Pre-training Captures

@article{Talmor2019oLMpicsOnWL,
title={oLMpics-On What Language Model Pre-training Captures},
author={Alon Talmor and Yanai Elazar and Yoav Goldberg and Jonathan Berant},
journal={Transactions of the Association for Computational Linguistics},
year={2019},
volume={8},
pages={743-758}
}
• Published 31 December 2019
• Computer Science
• Transactions of the Association for Computational Linguistics
Abstract Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should…
198 Citations

### Explaining Question Answering Models through Text Generation

• Computer Science
ArXiv
• 2020
A model for multi-choice question answering, where a LM-based generator generates a textual hypothesis that is later used by a classifier to answer the question, and produces hypotheses that elucidate the knowledge used by the LM for answering the question.

### Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

• Computer Science
NeurIPS
• 2020
This work provides a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements, and demonstrates that models learn to effectively perform inference which involves implicit taxonomic and world knowledge, chaining and counting.

### Birds Have Four Legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

• Computer Science
EMNLP
• 2020
Investigating whether and to what extent one can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process finds that this may not work for numerical Commonsense knowledge.

### Rethinking embedding coupling in pre-trained language models

• Computer Science
ICLR
• 2021
The analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable to other tasks and languages.

### A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

• Computer Science
ICLR
• 2021
This paper hypothesize, and verify empirically, that classification tasks of interest can be reformulated as next word prediction tasks, thus making language modeling a meaningful pretraining task, and analyzes properties of the cross-entropy objective to show that $\epsilon$-optimal language models in cross-ENTropy (log-perplexity) learn features that are $\mathcal{O}(\sqrt{\ep silon})$-good on natural linear classification tasks, demonstrating

### Neural Language Generation: Formulation, Methods, and Evaluation

• Computer Science
ArXiv
• 2020
There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field, so this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.

### How Context Affects Language Models' Factual Predictions

• Computer Science
AKBC
• 2020
This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.

### Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

• Computer Science
ArXiv
• 2020
It is found that despite the recent success of large PTLMs on commonsense benchmarks, their performances on probes are no better than random guessing (even with fine-tuning) and are heavily dependent on biases--the poor overall performance inhibits us from studying robustness.

### On the Interplay Between Fine-tuning and Sentence-Level Probing for Linguistic Knowledge in Pre-Trained Transformers

• Computer Science
BLACKBOXNLP
• 2020
It is argued that both positive and negative effects of fine-tuning on probing require a careful interpretation, and it is found that for some probing tasks fine- Tuning leads to substantial changes in accuracy, possibly suggesting that fine- tuning introduces or even removes linguistic knowledge from a pre-trained model.

### Can RoBERTa Reason? A Systematic Approach to Probe Logical Reasoning in Language Models

• Computer Science
• 2020
It is found that despite the current success of large LMs on commonsense benchmarks, their performance on these tasks is no better than random guessing, heavily dependent on biases, and not robust to the linguistic perturbation.

## References

SHOWING 1-10 OF 61 REFERENCES

### Learning and Evaluating General Linguistic Intelligence

• Computer Science
ArXiv
• 2019
This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.

### Language Models are Unsupervised Multitask Learners

• Computer Science
• 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

### BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

• Computer Science
NAACL
• 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

### Dissecting Contextual Word Embeddings: Architecture and Representation

• Computer Science
EMNLP
• 2018
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

### Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

• Computer Science
TACL
• 2016
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

### What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

A suite of diagnostics drawn from human language experiments are introduced, which allow us to ask targeted questions about information used by language models for generating predictions in context, and the popular BERT model is applied.

### Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

• Computer Science
EMNLP
• 2019
It is concluded that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain.

### Show Your Work: Improved Reporting of Experimental Results

• Computer Science
EMNLP
• 2019
It is demonstrated that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best, and a novel technique is presented: expected validation performance of the best-found model as a function of computation budget.

### XLNet: Generalized Autoregressive Pretraining for Language Understanding

• Computer Science
NeurIPS
• 2019
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

### Probing Natural Language Inference Models through Semantic Fragments

• Computer Science
AAAI
• 2020
This work proposes the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models.