Corpus ID: 211096588

GLU Variants Improve Transformer

@article{Shazeer2020GLUVI,
  title={GLU Variants Improve Transformer},
  author={Noam Shazeer},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.05202}
}
  • Noam Shazeer
  • Published in ArXiv 2020
  • Computer Science, Mathematics
  • Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. We test these variants in the feed-forward sublayers of the Transformer (arXiv:1706.03762) sequence-to-sequence model, and find that some of them yield quality improvements over the typically-used ReLU or GELU activations. 

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Tables and Topics from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 11 REFERENCES

    GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Attention is All you Need

    VIEW 5 EXCERPTS