• Corpus ID: 249282155

Locating and Editing Factual Associations in GPT

@inproceedings{Meng2022LocatingAE,
  title={Locating and Editing Factual Associations in GPT},
  author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},
  year={2022}
}
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that… 
Memory-Based Model Editing at Scale
TLDR
This work proposes Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model’s predictions as needed.

References

SHOWING 1-10 OF 48 REFERENCES
Fast Model Editing at Scale
TLDR
MEND is the only approach to model editing that effectively edits the behavior of models with more than 10 billion parameters, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable.
Editing Factual Knowledge in Language Models
TLDR
This work presents KnowledgeEditor, a method which can be used to edit factual knowledge and, thus, fix ‘bugs’ or unexpected predictions without the need for expensive re-training or fine-tuning.
GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax
  • 2021
Knowledge Neurons in Pretrained Transformers
TLDR
This paper examines the fill-in-the-blank cloze task for BERT and proposes a knowledge attribution method to identify the neurons that express the fact, finding that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts.
Transformer Feed-Forward Layers Are Key-Value Memories
TLDR
This work shows that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary.
Modifying Memories in Transformer Models
TLDR
This paper proposes a new task of explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts, and benchmarked several approaches that provide natural baseline performances on this task.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Measuring and Improving Consistency in Pretrained Language Models
TLDR
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.
How Much Knowledge Can You Pack into the Parameters of a Language Model?
TLDR
It is shown that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.
...
...