oLMpics-On What Language Model Pre-training Captures

@article{Talmor2019oLMpicsOnWL,
  title={oLMpics-On What Language Model Pre-training Captures},
  author={Alon Talmor and Yanai Elazar and Yoav Goldberg and Jonathan Berant},
  journal={Transactions of the Association for Computational Linguistics},
  year={2019},
  volume={8},
  pages={743-758}
}
Abstract Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should… 

Explaining Question Answering Models through Text Generation

A model for multi-choice question answering, where a LM-based generator generates a textual hypothesis that is later used by a classifier to answer the question, and produces hypotheses that elucidate the knowledge used by the LM for answering the question.

Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

This work provides a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements, and demonstrates that models learn to effectively perform inference which involves implicit taxonomic and world knowledge, chaining and counting.

Birds Have Four Legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Investigating whether and to what extent one can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process finds that this may not work for numerical Commonsense knowledge.

Rethinking embedding coupling in pre-trained language models

The analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable to other tasks and languages.

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

This paper hypothesize, and verify empirically, that classification tasks of interest can be reformulated as next word prediction tasks, thus making language modeling a meaningful pretraining task, and analyzes properties of the cross-entropy objective to show that $\epsilon$-optimal language models in cross-ENTropy (log-perplexity) learn features that are $\mathcal{O}(\sqrt{\ep silon})$-good on natural linear classification tasks, demonstrating

Neural Language Generation: Formulation, Methods, and Evaluation

There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field, so this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.

How Context Affects Language Models' Factual Predictions

This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.

Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models

It is found that despite the recent success of large PTLMs on commonsense benchmarks, their performances on probes are no better than random guessing (even with fine-tuning) and are heavily dependent on biases--the poor overall performance inhibits us from studying robustness.

On the Interplay Between Fine-tuning and Sentence-Level Probing for Linguistic Knowledge in Pre-Trained Transformers

It is argued that both positive and negative effects of fine-tuning on probing require a careful interpretation, and it is found that for some probing tasks fine- Tuning leads to substantial changes in accuracy, possibly suggesting that fine- tuning introduces or even removes linguistic knowledge from a pre-trained model.

Can RoBERTa Reason? A Systematic Approach to Probe Logical Reasoning in Language Models

  • Computer Science
  • 2020
It is found that despite the current success of large LMs on commonsense benchmarks, their performance on these tasks is no better than random guessing, heavily dependent on biases, and not robust to the linguistic perturbation.
...

References

SHOWING 1-10 OF 61 REFERENCES

Learning and Evaluating General Linguistic Intelligence

This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Dissecting Contextual Word Embeddings: Architecture and Representation

There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

A suite of diagnostics drawn from human language experiments are introduced, which allow us to ask targeted questions about information used by language models for generating predictions in context, and the popular BERT model is applied.

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

It is concluded that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain.

Show Your Work: Improved Reporting of Experimental Results

It is demonstrated that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best, and a novel technique is presented: expected validation performance of the best-found model as a function of computation budget.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

Probing Natural Language Inference Models through Semantic Fragments

This work proposes the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models.
...