• Corpus ID: 248863377

Describing Differences between Text Distributions with Natural Language

  title={Describing Differences between Text Distributions with Natural Language},
  author={Ruiqi Zhong and Charles Burton Snell and Dan Klein and Jacob Steinhardt},
  booktitle={International Conference on Machine Learning},
How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hun-dreds of samples. We propose to automatically describe the differences by “learning a natural language hypothesis”: given two distributions D 0 and D 1 , we search for a description that is more often true for D 1 , e.g., “ is military-related. ” To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of D 0… 

Figures and Tables from this paper

Unsupervised Explanation Generation via Correct Instantiations

N EON is proposed, a two-phrase, unsupervised explanation generation framework that generates corrected instantia- tions of the statement, then uses them to prompt large PLMs to complete the explanation and demonstrate that N EON remains effective when generalizing to different scenarios.

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Experiments on a wide range of datasets show that iPrompt can yield meaningful insights by accurately identifying groundtruth dataset descriptions and producing prompts that improve upon prompts for generalization.



BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

It is found that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Meta-tuning is proposed, which directly optimizes the zero-shot learning objective by finetuning pre-trained language models on a collection of datasets by aggregating 43 existing datasets and annotating 441 label descriptions in a question-answering (QA) format.

Unsupervised Domain Clusters in Pretrained Language Models

It is shown that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision – suggesting a simple data-driven definition of domains in textual data and proposing domain data selection methods based on such models, which require only a small set of in-domain monolingual data.

Stress Test Evaluation for Natural Language Inference

This work proposes an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions, and reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena.

Eliciting Knowledge from Language Models Using Automatically Generated Prompts

The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.