Incorporating Copying Mechanism in Sequence-to-Sequence Learning

  title={Incorporating Copying Mechanism in Sequence-to-Sequence Learning},
  author={Jiatao Gu and Zhengdong Lu and Hang Li and Victor O. K. Li},
We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. [] Key Method CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets…

Figures and Tables from this paper

Sequential Copying Networks

A novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence and leverages the pointer networks to explicitly select a sub-span from the source side to target side.

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

This work presents a model with an explicit token-level copy operation and extends it to copying entire spans, allowing for nontraditional applications of seq2seq, like information extraction.

Lexicon-constrained Copying Network for Chinese Abstractive Summarization

A lexicon-constrained copying network that models multi-granularity in both encoder and decoder that can outperform previous character-based models and achieve competitive performances is proposed.

Copy that! Editing Sequences by Copying Spans

This paper presents an extension of seq2seq models capable of copying entire spans of the input to the output in one step, greatly reducing the number of decisions required during inference.

Improving Grapheme-to-Phoneme Conversion by Investigating Copying Mechanism in Recurrent Architectures

This work proposes copy-augmented Bi-directional Long Short-Term Memory based Encoder-Decoder architecture for the Grapheme-to-Phoneme conversion and proves the applicability of the proposed approach on Hindi Lexicon and shows that the model outperforms all recent State-of-The-Art results.

Joint Copying and Restricted Generation for Paraphrase

A novel Seq2Seq model to fuse a copying decoder and a restricted generative decoder that outperforms the state-of-the-art approaches in terms of both informativeness and language quality.

Sequence-to-Sequence Learning with Latent Neural Grammars

This work develops a neural parameterization of the grammar which enables parameter sharing over the combinatorial space of derivation rules without the need for manual feature engineering, and applies it to a diagnostic language navigation task and to small-scale machine translation.

Deep Reinforcement Learning for Sequence-to-Sequence Models

This work provides the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provides some targeted experiments for these RL models, both in terms of performance and training time.

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

A new architecture that incorporates copying into the Convolutional Neural Networks plus Recurrent Neural Networks (RNN) image captioning framework, for describing novel objects in captions, and superior results are reported when compared to state-of-the-art deep models.

Efficient Summarization with Read-Again and Copy Mechanism

A simple mechanism that first reads the input sequence before committing to a representation of each word is introduced and a simple copy mechanism is proposed that is able to exploit very small vocabularies and handle out-of-vocabulary words.



Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Pointer Networks

A new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence using a recently proposed mechanism of neural attention, called Ptr-Nets, which improves over sequence-to-sequence with input attention, but also allows it to generalize to variable size output dictionaries.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Addressing the Rare Word Problem in Neural Machine Translation

This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.

Pointing the Unknown Words

A novel way to deal with the rare and unseen words for the neural network models using attention is proposed using attention, which uses two softmax layers in order to predict the next word in conditional language models.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Neural Responding Machine for Short-Text Conversation

Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming state-of-the-arts in the same setting, including retrieval-based and SMT-based models.

A Neural Attention Model for Abstractive Sentence Summarization

This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

A large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo is introduced and recurrent neural network is introduced for the summary generation and promising results are achieved.

Neural Random Access Machines

The proposed model can learn to solve algorithmic tasks of such type and is capable of operating on simple data structures like linked-lists and binary trees and generalize to sequences of arbitrary length.