Learning from Task Descriptions

  title={Learning from Task Descriptions},
  author={Orion Weller and Nicholas Lourie and Matt Gardner and Matthew E. Peters},
Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this framework with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen… 

Figures and Tables from this paper

Learning to Generate Task-Specific Adapters from Task Description

Hypter is introduced, a framework that improves text-to-text transformer’s generalization ability to unseen tasks by training a hypernetwork to generate task-specific, light-weight adapters from task descriptions.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

This work uses the task of deciding whether a given string matches a regular expression to identify properties of tasks, instructions, and instances that make instruction learning challenging, and proposes Hard RegSet as a challenging instruction learning task and a controlled environment for studying instruction learning.

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

These data support the concept that proper ED evaluation can identify a large body of patients with trivial ingestions who may not require hospital observation and help generalize NLP models to a variety of unseen tasks.

One-Shot Learning from a Demonstration with Hierarchical Latent Language

This work proposes a neural agent infused with hierarchical latent language at the level of task inference and subtask planning, and suggests that agents that form text-based inference are better equipped for the challenge under a random split of tasks.

LaSQuE: Improved Zero-Shot Classification from Explanations Through Quantifier Modeling and Curriculum Learning

This work presents LaSQuE, a method that can learn zero-shot classifiers from language explanations by using three new strategies - modeling the semantics of linguistic quanti-ness in explanations, aggregating information from multiple explanations using an attention-based mechanism, and model training via curriculum learning.

How Many Data Samples is an Additional Instruction Worth?

A subset of tasks in the expanded version of NATURAL INSTRUCTIONS is augmented with additional instructions and it is found that these significantly improve model performance, especially in the low-data regime.

Zero-shot Learning by Generating Task-specific Adapters

HYPTER is introduced, a framework that improves zero-shot transferability by training a hypernetwork to generate task-specific adapters from task descriptions, and greatly reduces the number of parameters by using light-weight adapters.

Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

A flexible and unified text-to-text paradigm called “Pretrain, Personalized Prompt, and Predict Paradigm” (P5) for recommendation, which unifies various recommendation tasks in a shared framework and will revolutionize the technical form of recommender systems towards universal recommendation engine.

Self-Instruct: Aligning Language Model with Self Generated Instructions

S ELF -I NSTRUCT provides an almost annotation-free method for aligning pre-trained language models with instructions, and is released to facili-tate future studies on instruction tuning.



Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Neural Module Networks for Reasoning over Text

This work extends Neural module networks by introducing modules that reason over a paragraph of text, performing symbolic reasoning over numbers and dates in a probabilistic and differentiable manner, and proposing an unsupervised auxiliary loss to help extract arguments associated with the events in text.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Evaluating NLP Models via Contrast Sets

A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets.

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

Zero-Shot Relation Extraction via Reading Comprehension

It is shown that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.