Adversarial Self-Supervised Data-Free Distillation for Text Classification

@inproceedings{Ma2020AdversarialSD,
  title={Adversarial Self-Supervised Data-Free Distillation for Text Classification},
  author={Xinyin Ma and Yongliang Shen and Gongfan Fang and Chen Chen and Chenghao Jia and Weiming Lu},
  booktitle={EMNLP},
  year={2020}
}
Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 46 REFERENCES
Data-Free Adversarial Distillation
  • 16
  • PDF
Zero-Shot Knowledge Distillation in Deep Networks
  • 65
  • Highly Influential
  • PDF
Zero-shot Knowledge Transfer via Adversarial Belief Matching
  • 40
  • Highly Influential
  • PDF
Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data
  • 14
Patient Knowledge Distillation for BERT Model Compression
  • 150
  • Highly Influential
  • PDF
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
  • 7
  • PDF
TinyBERT: Distilling BERT for Natural Language Understanding
  • 200
  • PDF
Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings
  • 2
  • PDF
Natural Language Generation for Effective Knowledge Distillation
  • 14
  • PDF
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  • 1,024
  • PDF
...
1
2
3
4
5
...