Corpus ID: 198895166

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

@article{Lin2019DropAttentionAR,
  title={DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks},
  author={Zehui Lin and Pengfei Liu and Luyao Huang and Junkun Chen and Xipeng Qiu and X. Huang},
  journal={ArXiv},
  year={2019},
  volume={abs/1907.11065}
}
Variants dropout methods have been designed for the fully-connected layer, convolutional layer and recurrent layer in neural networks, and shown to be effective to avoid overfitting. [...] Key Result Experiments on a wide range of tasks show that DropAttention can improve performance and reduce overfitting.Expand
3 Citations
Scheduled DropHead: A Regularization Method for Transformer Models
  • 2
  • PDF
Spatial Temporal Transformer Network for Skeleton-based Action Recognition
  • 5
  • PDF

References

SHOWING 1-10 OF 28 REFERENCES
Recurrent Dropout without Memory Loss
  • 154
  • PDF
DropBlock: A regularization method for convolutional networks
  • 279
  • Highly Influential
  • PDF
Regularization of Neural Networks using DropConnect
  • 1,807
  • PDF
Improved Regularization of Convolutional Neural Networks with Cutout
  • 916
  • PDF
Shake-Shake regularization
  • 241
  • Highly Influential
  • PDF
Multi-Head Attention with Disagreement Regularization
  • 61
  • Highly Influential
  • PDF
Attention is All you Need
  • 16,989
  • Highly Influential
  • PDF
Shakedrop Regularization for Deep Residual Learning
  • 73
  • PDF
...
1
2
3
...