Corpus ID: 233033712

Compressing Visual-linguistic Model via Knowledge Distillation

@article{Fang2021CompressingVM,
  title={Compressing Visual-linguistic Model via Knowledge Distillation},
  author={Zhiyuan Fang and Jianfeng Wang and Xiaowei Hu and Lijuan Wang and Yezhou Yang and Zicheng Liu},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.02096}
}
Despite exciting progress in pre-training for visuallinguistic (VL) representations, very few aspire to a small VL model. In this paper, we study knowledge distillation (KD) to effectively compress a transformer based large VL model into a small VL model. The major challenge arises from the inconsistent regional visual tokens extracted from different detectors of Teacher and Student, resulting in the misalignment of hidden representations and attention distributions. To address the problem, we… Expand

References

SHOWING 1-10 OF 76 REFERENCES
Contrastive Distillation on Intermediate Representations for Language Model Compression
VideoBERT: A Joint Model for Video and Language Representation Learning
VinVL: Making Visual Representations Matter in Vision-Language Models
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Unified Vision-Language Pre-Training for Image Captioning and VQA
Sequence-Level Knowledge Distillation
...
1
2
3
4
5
...