Unified BERT for Few-shot Natural Language Understanding

@article{Lu2022UnifiedBF,
  title={Unified BERT for Few-shot Natural Language Understanding},
  author={JunYu Lu and Ping Yang and Jiaxing Zhang and Ruyi Gan and Jing Yang},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.12094}
}
Even as pre-trained language models share a semantic encoder, natural language understanding suffers from a diversity of output schemas. In this paper, we propose UBERT, a unified bidirectional language understanding model based on BERT framework, which can universally model the training objects of different NLU tasks through a biaffine network. Specifically, UBERT encodes prior knowledge from various aspects, uniformly constructing learning representations across multiple NLU tasks, which is… 

Figures from this paper

Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

TLDR
F Fengshenbang aims to re-evaluate the open-source community of Chinese pre-trained large-scale models, prompting the development of the entire Chinese large- scale model community, and invites companies, colleges, and research institutions to collaborate with us to build the large- Scale open- source model-based ecosystem.

References

SHOWING 1-10 OF 17 REFERENCES

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Muppet: Massive Multi-task Representations with Pre-Finetuning

TLDR
It is shown that pre-finetuning consistently improves performance for pretrained discriminators and generation models on a wide range of tasks while also significantly improving sample efficiency during fine-tuning, and that large-scale multi-tasking is crucial.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Bart: Denoising sequence-to-sequence pretraining for natural language

  • 2020

Unified Structure Generation for Universal Information Extraction

TLDR
A unified text-to-structure generation framework, namely UIE, which can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources is proposed.

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

TLDR
New pretrained contextualized representations of words and entities based on the bidirectional transformer, and an entity-aware self-attention mechanism that considers the types of tokens (words or entities) when computing attention scores are proposed.

Named Entity Recognition as Dependency Parsing

TLDR
Ideas from graph-based dependency parsing are used to provide the model a global view on the input via a biaffine model and show that the model works well for both nested and flat NER, through evaluation on 8 corpora and achieving SoTA performance on all of them.

Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference

TLDR
This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.