WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

  author={Heting Gao and Junrui Ni and Kaizhi Qian and Yang Zhang and Shiyu Chang and Mark A. Hasegawa-Johnson},
Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text exam-ples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning like the text embeddings of the language model. Interested in exploring the possibility of… 

