Adversarial Self-Supervised Data-Free Distillation for Text Classification
@inproceedings{Ma2020AdversarialSD, title={Adversarial Self-Supervised Data-Free Distillation for Text Classification}, author={Xinyin Ma and Yongliang Shen and Gongfan Fang and Chen Chen and Chenghao Jia and Weiming Lu}, booktitle={EMNLP}, year={2020} }
Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data… Expand
Figures and Tables from this paper
References
SHOWING 1-10 OF 46 REFERENCES
Zero-Shot Knowledge Distillation in Deep Networks
- Computer Science, Mathematics
- ICML
- 2019
- 65
- Highly Influential
- PDF
Zero-shot Knowledge Transfer via Adversarial Belief Matching
- Computer Science, Mathematics
- NeurIPS
- 2019
- 40
- Highly Influential
- PDF
Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data
- Computer Science
- ArXiv
- 2019
- 14
Patient Knowledge Distillation for BERT Model Compression
- Computer Science
- EMNLP/IJCNLP
- 2019
- 150
- Highly Influential
- PDF
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
- Computer Science, Mathematics
- 2019
- 7
- PDF
Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings
- Computer Science, Mathematics
- DeepLo@EMNLP-IJCNLP
- 2019
- 2
- PDF
Natural Language Generation for Effective Knowledge Distillation
- Computer Science
- DeepLo@EMNLP-IJCNLP
- 2019
- 14
- PDF
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2020
- 1,024
- PDF