GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

@article{Wang2018GLUEAM,
  title={GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.07461}
}
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. [...] Key Method We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately…Expand
Effects and Mitigation of Out-of-vocabulary in Universal Language Models
TLDR
The adverse effects of OOV is demonstrated in the context of transfer learning in CJK languages, then a novel approach is proposed to maximize the utility of a pre-trained model suffering from OOV. Expand
GLGE: A New General Language Generation Evaluation Benchmark
TLDR
The General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, is presented and a leaderboard with strong baselines including MASS, BART, and ProphetNet is built. Expand
On the Role of Corpus Ordering in Language Modeling
Language models pretrained on vast corpora of unstructured text using self-supervised learning framework are used in numerous natural language understanding and generation tasks. Many studies showExpand
Generalizing Natural Language Analysis through Span-relation Representations
TLDR
This paper provides the simple insight that a great variety of tasks can be represented in a single unified format consisting of labeling spans and relations between spans, thus a single task-independent model can be used across different tasks. Expand
SILT: Efficient transformer training for inter-lingual inference
TLDR
Evidence is found that SILT allows to reduce drastically the number of trainable parameters while allowing for inter-lingual NLI and achieving state-of-the-art performance on common benchmarks. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
FlauBERT: Unsupervised Language Model Pre-training for French
TLDR
This paper introduces and shares FlauBERT, a model learned on a very large and heterogeneous French corpus and applies it to diverse NLP tasks and shows that most of the time they outperform other pre-training approaches. Expand
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
Infusing Finetuning with Semantic Dependencies
TLDR
This approach applies novel probes to recent language models and finds that, unlike syntax, semantics is not brought to the surface by today’s pretrained models, and uses convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding tasks in the GLUE benchmark. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 72 REFERENCES
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
TLDR
This paper describes a model (alpha) that is ranked among the top in the Shared Task, on both the in- domain test set and on the cross-domain test set, demonstrating that the model generalizes well to theCross-domain data. Expand
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
TLDR
A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Expand
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TLDR
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus. Expand
Sluice networks: Learning what to share between loosely related tasks
TLDR
Sluice Networks is introduced, a general framework for multi-task learning where trainable parameters control the amount of sharing and it is shown that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while slUice networks easily fit noise, they are robust across domains in practice. Expand
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
TLDR
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems. Expand
AllenNLP: A Deep Semantic Natural Language Processing Platform
TLDR
AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily and provides a flexible data API that handles intelligent batching and padding, and a modular and extensible experiment framework that makes doing good science easy. Expand
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time. Expand
One billion word benchmark for measuring progress in statistical language modeling
TLDR
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. Expand
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
TLDR
This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. Expand
DisSent: Sentence Representation Learning from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Expand
...
1
2
3
4
5
...