Adaptively Sparse Transformers

@article{Correia2019AdaptivelyST,
  title={Adaptively Sparse Transformers},
  author={Gonçalo M. Correia and Vlad Niculae and Andr{\'e} F. T. Martins},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.00015}
}
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent… CONTINUE READING

Citations

Publications citing this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 36 REFERENCES

Attention is All you Need

VIEW 9 EXCERPTS
HIGHLY INFLUENTIAL