Transformer++

@article{Thapak2020Transformer,
  title={Transformer++},
  author={Prakhar Thapak and Prodip Hore},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.04974}
}
  • Prakhar Thapak, Prodip Hore
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Recent advancements in attention mechanisms have replaced recurrent neural networks and its variants for machine translation tasks. Transformer using attention mechanism solely achieved state-of-the-art results in sequence modeling. Neural machine translation based on the attention mechanism is parallelizable and addresses the problem of handling long-range dependencies among words in sentences more effectively than recurrent neural networks. One of the key concepts in attention is to learn… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 36 REFERENCES
    Attention is All you Need
    10817
    Effective Approaches to Attention-based Neural Machine Translation
    3807
    Can Active Memory Replace Attention?
    40
    MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
    6
    Attention-Based Models for Speech Recognition
    1225
    Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning
    59