Share This Author
Unified Language Model Pre-training for Natural Language Understanding and Generation
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou
- Computer ScienceNeurIPS
- 25 February 2020
This work presents a simple and effective approach to compress large Transformer (Vaswani et al., 2017) based pre-trained models, termed as deep self-attention distillation, and demonstrates that the monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
The experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks.
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
An information-theoretic framework that formulates cross-lingual language model pre- training as maximizing mutual information between multilingual-multi-granularity texts is presented and a new pre-training task based on contrastive learning is proposed.
Cross-Lingual Natural Language Generation via Pre-Training
- Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, Heyan Huang
- Computer ScienceAAAI
- 23 September 2019
Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
This work defines multi-head selfattention relations as scaled dot-product between the pairs of query, key, and value vectors within each self-attention module and employs the above relational knowledge to train the student model.
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
A pretrained VLM O that jointly learns a dual encoder and a fusion encoder with a modular Transformer network and a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs is proposed.
Harvesting and Refining Question-Answer Pairs for Unsupervised QA
This work introduces two approaches to improve unsupervised QA, which harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs and takes advantage of the QA model to extract more appropriate answers.
Consistency Regularization for Cross-Lingual Fine-Tuning
This work uses example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-switch substitution, and machine translation, to improve cross-lingual fine-tuning with consistency regularized.
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
This paper proposes domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically automatically and systematically explores different strategies to compress the large pretrained models for specific domains.