Insertion-based Decoding with Automatically Inferred Generation Order

@article{Gu2019InsertionbasedDW,
  title={Insertion-based Decoding with Automatically Inferred Generation Order},
  author={Jiatao Gu and Qi Liu and Kyunghyun Cho},
  journal={Transactions of the Association for Computational Linguistics},
  year={2019},
  volume={7},
  pages={661-676}
}
Abstract Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders… 
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
TLDR
A novel incremental positional encoding scheme for insertion transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps and leads to reduction of floating point operations and latency improvements on batched decoding.
On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators
TLDR
The proposed InsNet is an insertion-based sequence model that can be trained as efficiently as traditional transformer decoders while maintaining the same performance as that with a bi-directional context encoder, and is evaluated on story generation and CleVR-CoGENT captioning.
Levenshtein Transformer
TLDR
Levenshtein Transformer is developed, a new partially autoregressive model devised for more flexible and amenable sequence generation and a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature.
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
TLDR
POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation, which achieves state-of-the-art performance on constrained text generation.
Insertion-based Tree Decoding
TLDR
This work presents a novel general-purpose partially autoregressive tree decoder that uses treebased insertion operations to generate trees in sub-linear time and evaluates the approach on semantic parsing and compare it against strong baselines, including an insertion-based sequence decoder.
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
TLDR
The Insertion Transformer outperforms many prior non-autoregressive approaches to translation at comparable or better levels of parallelism, and successfully recovers the performance of the original Transformer while requiring only logarithmically many iterations during decoding.
Sequence Modeling with Unconstrained Generation Order
TLDR
This model learns decoding order as a result of its training procedure, and is superior to fixed order models on a number of sequence generation tasks, such as Machine Translation, Image-to-LaTeX and Image Captioning.
Fast Interleaved Bidirectional Sequence Generation
TLDR
This work takes inspiration from bidirectional sequence generation and introduces a decoder that generates target words from the left- to-right and right-to-left directions simultaneously, and achieves a decoding speedup of ~2x compared to autoregressive decoding with comparable quality.
Learning and Analyzing Generation Order for Undirected Sequence Models
TLDR
A policy is trained that learns the generation order for a pre-trained, undirected translation model via reinforcement learning that outperforms all heuristic generation orders on four out of six tasks and usually predicts positions for a single syntactic constituent structure in consecutive steps.
Non-autoregressive Machine Translation with Disentangled Context Transformer
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 106 REFERENCES
Non-Monotonic Sequential Text Generation
TLDR
This work proposes a framework for training models of text generation that operate in non-monotonic orders, and demonstrates that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order while achieving competitive performance with conventional left-to-right generation.
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
TLDR
The Insertion Transformer outperforms many prior non-autoregressive approaches to translation at comparable or better levels of parallelism, and successfully recovers the performance of the original Transformer while requiring only logarithmically many iterations during decoding.
Sequence Generation: From Both Sides to the Middle
TLDR
A synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously simultaneously is proposed which significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.
Middle-Out Decoding
TLDR
This paper proposes a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions, and introduces a dual self-attention mechanism that allows us to model complex dependencies between the outputs.
Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model
TLDR
A novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold is proposed, which is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm.
Blockwise Parallel Decoding for Deep Autoregressive Models
TLDR
This work proposes a novel blockwise parallel decoding scheme in which it makes predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model, which allows for substantial theoretical improvements in generation speed when applied to architectures that can process output sequences in parallel.
Synchronous Bidirectional Neural Machine Translation
TLDR
A synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left- to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time.
Semi-Autoregressive Neural Machine Translation
TLDR
A novel model for fast sequence generation — the semi-autoregressive Transformer (SAT), which keeps the autoregressive property in global but relieves in local and thus are able to produce multiple successive words in parallel at each time step.
Incorporating Copying Mechanism in Sequence-to-Sequence Learning
TLDR
This paper incorporates copying into neural network-based Seq2Seq learning and proposes a new model called CopyNet with encoder-decoder structure which can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence.
The Importance of Generation Order in Language Modeling
TLDR
This paper studies the influence of token generation order on model quality via a novel two-pass language model that produces partially-filled sentence “templates” and then fills in missing tokens and finds the most effective strategy generates function words in the first pass followed by content Words in the second.
...
1
2
3
4
5
...