WinoDict: Probing language models for in-context word acquisition

  title={WinoDict: Probing language models for in-context word acquisition},
  author={Julian Martin Eisenschlos and Jeremy R. Cole and Fangyu Liu and William W. Cohen},
We introduce a new in-context learning paradigm to measure Large Language Models’ (LLMs) ability to learn novel words during in-ference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary definition of the new word given in the prompt. This benchmark addresses word acquisition, one… 

Figures and Tables from this paper



Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

A Systematic Investigation of Commonsense Understanding in Large Language Models

It is found that the impressive zeroshot performance of large language models is mostly due to existence of dataset bias in the authors' benchmarks, and that leveraging explicit commonsense knowledge does not yield substantial improvement.

Learning to Understand Phrases by Embedding the Dictionary

This work proposes using the definitions found in everyday dictionaries as a means of bridging the gap between lexical and phrasal semantics, and presents two applications of these architectures: reverse dictionaries that return the name of a concept given a definition or description and general-knowledge crossword question answerers.

A Simple Method for Commonsense Reasoning

Key to this method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests, which outperform previous state-of-the-art methods by a large margin.

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

It is argued that now is the right time to rethink the static way in which the authors currently train and evaluate their language models, and develop adaptive language models that can remain up-to-date with respect to their ever-changing and non-stationary world.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

This paper shows that ground truth demonstrations are in fact not required and that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of the label space, the distribution of the input text, and the overall format of the sequence.

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

This work uses the generative nature of language models to construct an artificial development set and based on entropy statistics of the candidate permutations on this set, it identifies performant prompts and yields a 13% relative improvement for GPT-family models across eleven different established text classification tasks.

PaLM: Scaling Language Modeling with Pathways

A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

Time-Aware Language Models as Temporal Knowledge Bases

This work proposes a simple technique for jointly modeling text with its timestamp that improves memorization of seen facts from the training time period, as well as calibration on predictions about unseen facts from future time periods and shows that models trained with temporal context can be efficiently "refreshed" as new data arrives.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.