Adversarial Self-Supervised Data-Free Distillation for Text Classification

@inproceedings{Ma2020AdversarialSD,
  title={Adversarial Self-Supervised Data-Free Distillation for Text Classification},
  author={Xinyin Ma and Yongliang Shen and Gongfan Fang and Chen Chen and Chenghao Jia and Weiming Lu},
  booktitle={EMNLP},
  year={2020}
}
Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 46 REFERENCES
Data-Free Adversarial Distillation
  • 17
  • PDF
Zero-Shot Knowledge Distillation in Deep Networks
  • 71
  • Highly Influential
  • PDF
Zero-shot Knowledge Transfer via Adversarial Belief Matching
  • 45
  • Highly Influential
  • PDF
Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data
  • 14
  • PDF
Patient Knowledge Distillation for BERT Model Compression
  • 160
  • Highly Influential
  • PDF
Extreme Language Model Compression with Optimal Subwords and Shared Projections
  • 32
  • PDF
Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings
  • 3
  • PDF
Natural Language Generation for Effective Knowledge Distillation
  • 15
  • PDF
Data-Free Knowledge Distillation for Deep Neural Networks
  • 83
  • PDF
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
  • 795
  • PDF
...
1
2
3
4
5
...