oLMpics-On What Language Model Pre-training Captures

@article{Talmor2020oLMpicsOnWL,
  title={oLMpics-On What Language Model Pre-training Captures},
  author={Alon Talmor and Yanai Elazar and Yoav Goldberg and Jonathan Berant},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  volume={8},
  pages={743-758}
}
Abstract Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should… Expand
Explaining Question Answering Models through Text Generation
TLDR
A model for multi-choice question answering, where a LM-based generator generates a textual hypothesis that is later used by a classifier to answer the question, and produces hypotheses that elucidate the knowledge used by the LM for answering the question. Expand
Rethinking embedding coupling in pre-trained language models
TLDR
The analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable to other tasks and languages. Expand
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Autoregressive language models pretrained on large corpora have been successful at solving downstream tasks, even with zero-shot usage. However, there is little theoretical justification for theirExpand
Neural Language Generation: Formulation, Methods, and Evaluation
TLDR
There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field, so this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation. Expand
How Context Affects Language Models' Factual Predictions
TLDR
This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Expand
Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models
TLDR
It is found that despite the recent success of large PTLMs on commonsense benchmarks, their performances on probes are no better than random guessing (even with fine-tuning) and are heavily dependent on biases--the poor overall performance inhibits us from studying robustness. Expand
On the Interplay Between Fine-tuning and Sentence-Level Probing for Linguistic Knowledge in Pre-Trained Transformers
TLDR
It is argued that both positive and negative effects of fine-tuning on probing require a careful interpretation, and it is found that for some probing tasks fine- Tuning leads to substantial changes in accuracy, possibly suggesting that fine- tuning introduces or even removes linguistic knowledge from a pre-trained model. Expand
Can RoBERTa Reason? A Systematic Approach to Probe Logical Reasoning in Language Models
  • 2020
Humans can map natural language into a logical representation that is robust to linguistic variations and useful for reasoning. While pre-trained language models (LM) have dramatically improvedExpand
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
TLDR
VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training. Expand
On the Interplay Between Fine-tuning and Sentence-Level Probing for Linguistic Knowledge in Pre-Trained Transformers
TLDR
It is argued that both positive and negative effects of fine-tuning on probing require a careful interpretation, and it is found that for some probing tasks fine- Tuning leads to substantial changes in accuracy, possibly suggesting that fine- tuning introduces or even removes linguistic knowledge from a pre-trained model. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 101 REFERENCES
Learning and Evaluating General Linguistic Intelligence
TLDR
This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. Expand
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
TLDR
It is concluded that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain. Expand
Show Your Work: Improved Reporting of Experimental Results
TLDR
It is demonstrated that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best, and a novel technique is presented: expected validation performance of the best-found model as a function of computation budget. Expand
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TLDR
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation. Expand
Probing Natural Language Inference Models through Semantic Fragments
TLDR
This work proposes the use of semantic fragments---systematically generated datasets that each target a different semantic phenomenon---for probing, and efficiently improving, such capabilities of linguistic models. Expand
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
...
1
2
3
4
5
...