Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

@article{Chai2020HighwayTS,
  title={Highway Transformer: Self-Gating Enhanced Self-Attentive Networks},
  author={Yekun Chai and Jin Shuo and Xinwen Hou},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.08178}
}
Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary… Expand
Combination of Neural Machine Translation Systems at WMT20
Transformer with Depth-Wise LSTM

References

SHOWING 1-10 OF 26 REFERENCES
Pay Less Attention with Lightweight and Dynamic Convolutions
R-Transformer: Recurrent Neural Network Enhanced Transformer
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Attention is All you Need
Distance-based Self-Attention Network for Natural Language Inference
Language Modeling with Gated Convolutional Networks
Highway Networks
A Gated Self-attention Memory Network for Answer Selection
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
...
1
2
3
...