CLUE: A Chinese Language Understanding Evaluation Benchmark

@article{Xu2020CLUEAC,
  title={CLUE: A Chinese Language Understanding Evaluation Benchmark},
  author={Liang Xu and Xuanwei Zhang and Lu Li and Hai Hu and Chenjie Cao and Weitang Liu and Junyi Li and Yudong Li and Kai Sun and Yechen Xu and Yiming Cui and Cong Yu and Qianqian Dong and Yin Tian and Dian Yu and Bo Shi and Jun-jie Zeng and Rongzhao Wang and Weijian Xie and Yanting Li and Yina Patterson and Zuoyu Tian and Yiwen Zhang and He Zhou and Shaoweihua Liu and Quanbei Zhao and Cong Yue and Xinrui Zhang and Zhen-Yi Yang and Kyle Richardson and Zhenzhong Lan},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.05986}
}
The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue… 

Tables from this paper

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

TLDR
The first-ever vast resource for training, evaluation, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks is introduced, releasing baseline models for all twelve tasks, as well as the framework for benchmark evaluation, thus enabling everyone to benchmark their system performances.

OCNLI: Original Chinese Natural Language Inference

TLDR
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses.

Does Chinese BERT Encode Word Structure?

TLDR
This work investigates Chinese BERT using both attention weight distribution statistics and probing tasks, finding that word information is captured by BERT; word-level features are mostly in the middle representation layers; and downstream tasks make different use of word features in BERT.

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

TLDR
This paper proposes a novel pre-trained language model, referred to as AMBERT (A Multi-grained BERT), on the basis of both fine- grained and coarse-grains tokenizations, which outperforms the existing best performing models in almost all cases.

Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge

TLDR
This paper aims to extract a new kind of structured knowledge from scripts and use it to improve MRC, and designs a teacher-student paradigm with multiple teachers to facilitate the transfer of knowledge in weakly-labeled MRC data.

Cross-lingual Inference with A Chinese Entailment Graph

TLDR
This paper presents the first pipeline for building Chinese entailment graphs, which involves a novel high-recall open relation extraction (ORE) method and the first Chinese fine-grained entity typing dataset under the FIGER type ontology.

WeLM: A Well-Read Pre-trained Language Model for Chinese

TLDR
A well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations, and has basic skills at explaining and calibrating the decisions from itself, which can be promising directions for future research.

CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations

TLDR
This work proposes a simple yet effective PLM CLOWER, which adopts the Contrastive Learning Over Word and charactER representations, and implicitly encodes the coarse-grained information into the multi-Grained representations through contrastive learning on multi- grained information.

JGLUE: Japanese General Language Understanding Evaluation

TLDR
A Japanese NLU benchmark is built from scratch without translation to measure the general NLU ability in Japanese, and it is hoped that JGLUE will facilitate NLU research in Japanese.

Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

TLDR
This work explores the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces, and calls such word-level BERT model as WordBERT, which makes significant improvements on cloze test and machine reading comprehension.
...

References

SHOWING 1-10 OF 40 REFERENCES

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

TLDR
A large-scale Chinese cloze test dataset ChID is proposed, which studies the comprehension of idiom, a unique language phenomenon in Chinese, in which the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms.

CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

TLDR
The Chinese corpus from CLUE organization, CLUECorpus2020, a large-scale corpus that can be used directly for self-supervised learning such as pre-training of a language model, or language generation is introduced.

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

TLDR
This paper introduces a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area and hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).

Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension

TLDR
Experimental results demonstrate that linguistic and general world knowledge may help improve the performance of the baseline reader in both general and domain-specific tasks.

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal sentence representations. SentEval encompasses a variety of tasks, including binary and multi-class classification, natural

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

TLDR
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Pre-Training With Whole Word Masking for Chinese BERT

TLDR
The whole word masking (wwm) strategy for Chinese BERT is introduced, along with a series of Chinese pre-trained language models, and a simple but effective model called MacBERT is proposed, which improves upon RoBERTa in several ways.

LCQMC:A Large-scale Chinese Question Matching Corpus

TLDR
A search engine is used to collect large-scale question pairs related to high-frequency words from various domains, then filter irrelevant pairs by the Wasserstein distance, and finally recruit three annotators to manually check the left pairs to demonstrate the good quality of LCQMC.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.