Corpus ID: 210698654

Parallel Machine Translation with Disentangled Context Transformer

@article{Kasai2020ParallelMT,
  title={Parallel Machine Translation with Disentangled Context Transformer},
  author={Jungo Kasai and James Cross and Marjan Ghazvininejad and Jiatao Gu},
  journal={ArXiv},
  year={2020},
  volume={abs/2001.05136}
}
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo… Expand
16 Citations
Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation
  • 22
  • PDF
Aligned Cross Entropy for Non-Autoregressive Machine Translation
  • 18
  • PDF
Context-Aware Cross-Attention for Non-Autoregressive Translation
  • 5
  • PDF
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
  • 3
  • Highly Influenced
  • PDF
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
  • PDF
SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling
  • 4
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
Non-Autoregressive Neural Machine Translation
  • 210
  • PDF
Levenshtein Transformer
  • 90
  • PDF
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
  • 121
  • PDF
Non-Autoregressive Machine Translation with Auxiliary Regularization
  • 67
  • PDF
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
  • 18
  • PDF
Fast Structured Decoding for Sequence Models
  • 31
  • PDF
Hint-Based Training for Non-Autoregressive Machine Translation
  • 27
  • Highly Influential
  • PDF
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
  • 54
  • PDF
Sequence-Level Knowledge Distillation
  • 298
  • PDF
...
1
2
3
4
...