Rank-One Editing of Encoder-Decoder Models

  title={Rank-One Editing of Encoder-Decoder Models},
  author={Vikas Raunak and Arul Menezes},
Large sequence to sequence models for tasks such as Neural Machine Translation (NMT) are usually trained over hundreds of millions of samples. However, training is just the origin of a model’s life-cycle. Real-world deployments of models require further behavioral adaptations as new requirements emerge or shortcomings become known. Typically, in the space of model behaviors, behavior deletion requests are addressed through model retrainings whereas model finetuning is done to address behavior… 

Figures and Tables from this paper

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Time (TIME Dataset), containing 147 source and destination prompt pairs from various domains, is introduced, and it is shown that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.



Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Rewriting a Deep Generative Model

This paper introduces a new problem setting: manipulation of specific rules encoded by a deep generative model, and proposes a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.

Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation

This work adapts Elastic Weight Consolidation (EWC)—a machine learning method for learning a new task without forgetting previous tasks—to mitigate the drop in general-domain performance as catastrophic forgetting of general- domain knowledge.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

SALTED is introduced, a specifications-based framework for behavioral testing of MT models that provides fine-grained views of salient long-tail errors, permitting trustworthy visibility into previously invisible problems.

Patching open-vocabulary models by interpolating weights

PAINT, a patching method that uses interpolations between the weights of a model before and after patching and the weights after on a task to be patched, is introduced, demonstrating that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

Continual Lifelong Learning in Natural Language Processing: A Survey

This work looks at the problem of CL through the lens of various NLP tasks, and discusses major challenges in CL and current methods applied in neural network models.

Transformer Feed-Forward Layers Are Key-Value Memories

This work shows that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary.

Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks

It is demonstrated that extractive memorization poses a serious threat to NMT reliability by qualitatively and quantitatively characterizing the memorized samples as well as the model behavior in their vicinity.