Locating and Editing Factual Associations in GPT
@inproceedings{Meng2022LocatingAE, title={Locating and Editing Factual Associations in GPT}, author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov}, year={2022} }
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that…
Figures and Tables from this paper
One Citation
Memory-Based Model Editing at Scale
- Computer Science
- 2022
This work proposes Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model’s predictions as needed.
References
SHOWING 1-10 OF 48 REFERENCES
Fast Model Editing at Scale
- Computer ScienceArXiv
- 2021
MEND is the only approach to model editing that effectively edits the behavior of models with more than 10 billion parameters, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable.
Editing Factual Knowledge in Language Models
- Computer ScienceEMNLP
- 2021
This work presents KnowledgeEditor, a method which can be used to edit factual knowledge and, thus, fix ‘bugs’ or unexpected predictions without the need for expensive re-training or fine-tuning.
GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax
- 2021
Knowledge Neurons in Pretrained Transformers
- Computer ScienceACL
- 2022
This paper examines the fill-in-the-blank cloze task for BERT and proposes a knowledge attribution method to identify the neurons that express the fact, finding that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts.
Transformer Feed-Forward Layers Are Key-Value Memories
- Computer ScienceEMNLP
- 2021
This work shows that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary.
Modifying Memories in Transformer Models
- Computer ScienceArXiv
- 2020
This paper proposes a new task of explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts, and benchmarked several approaches that provide natural baseline performances on this task.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Measuring and Improving Consistency in Pretrained Language Models
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
- PsychologyTransactions of the Association for Computational Linguistics
- 2021
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.
How Much Knowledge Can You Pack into the Parameters of a Language Model?
- Computer ScienceEMNLP
- 2020
It is shown that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.