• Corpus ID: 229156229

Extracting Training Data from Large Language Models

  title={Extracting Training Data from Large Language Models},
  author={Nicholas Carlini and Florian Tram{\`e}r and Eric Wallace and Matthew Jagielski and Ariel Herbert-Voss and Katherine Lee and Adam Roberts and Tom B. Brown and Dawn Xiaodong Song and {\'U}lfar Erlingsson and Alina Oprea and Colin Raffel},
  booktitle={USENIX Security Symposium},
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These… 

Figures and Tables from this paper

Privacy Analysis in Language Models via Training Data Leakage Report

A methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model is introduced and two metrics to quantify user-level data leakage by measuring a model’s ability to produce unique sentence fragments within training data are proposed.

Membership Inference on Word Embedding and Beyond

It is shown that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions, and that this leakage persists through two other major NLP applications: classification and text-generation, even when the embedding layer is not exposed to the attacker.

How Private is Machine Learning?

To what extent current networks are private, an attack that extracts rare training examples out of GPT-2, a language model trained on gigabytes of text from the Internet is developed, and it is found that standard models are not private.

Documenting the English Colossal Clean Crawled Corpus

This work provides some of the first documentation of the English Colossal Clean Crawled Corpus (C4), one of the largest corpora of text available, and hosts an indexed version of C4 at https://c4-search.allenai.org/, allowing anyone to search it.

Hidden Backdoors in Human-Centric Language Models

The proposed hidden backdoors can be effective across three downstream security-critical NLP tasks, representative of modern human-centric NLP systems, including toxic comment detection, neural machine translation (NMT), and question answering (QA).

A Survey on Data Augmentation for Text Classification

This survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners.

Meta-tuning Language Models to Answer Prompts Better

This work proposes meta-tuning, which trains the model to specialize in answering prompts but still generalize to unseen tasks, and outperforms a same-sized QA model for most labels on unseen tasks.

When is memorization of irrelevant training data necessary for high-accuracy learning?

This paper describes natural prediction problems in which every sufficiently accurate training algorithm must encode essentially all the information about a large subset of its training examples, which remains true even when the examples are high-dimensional and have entropy much higher than the sample size.

The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures

This paper summarizes and examines the current state-of-the-art (SOTA) NLP models that have been employed for numerous NLP tasks for optimal performance and efficiency, and provides a detailed understanding and functioning of the different architectures, a taxonomy of NLP designs, comparative evaluations, and future directions in NLP.

Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record De-identification

This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task, and finds that BiLSTM-CRF represents the best-performing encoder/decoder combination.



Auditing Data Provenance in Text-Generation Models

A new model auditing technique is developed that helps users check if their data was used to train a machine learning model, and it is empirically shown that the method can successfully audit well-generalized models that are not overfitted to the training data.

Privacy Risks of General-Purpose Language Models

This study presents the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies and demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Training Production Language Models without Memorizing User Data

This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique, and demonstrates the deployment of a differentially private mechanism for the training of a production neural network in FL.

Language Models as Knowledge Bases?

An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Understanding Membership Inferences on Well-Generalized Learning Models

It is demonstrated that even a well-generalized model contains vulnerable instances subject to a new generalized MIA (GMIA), and novel techniques for selecting vulnerable instances and detecting their subtle influences ignored by overfitting metrics are used.

Information Leakage in Embedding Models

This work develops three classes of attacks to systematically study information that might be leaked by embeddings, and extensively evaluates the attacks on various state-of-the-art embedding models in the text domain.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Better language models and their implications

  • OpenAI Blog,
  • 2019