Corpus ID: 234790338

KLUE: Korean Language Understanding Evaluation

  title={KLUE: Korean Language Understanding Evaluation},
  author={Sungjoon Park and Jihyung Moon and Sung-Dong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Tae Hwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Young-kuk Jeong and Inkwon Lee and Sang-gyu Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice H. Oh and Jung-Woo Ha and Kyunghyun Cho},
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone… Expand
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
  • Liang Xu, Xiaojing Lu, +6 authors Hai Hu
  • Computer Science
  • ArXiv
  • 2021
This work introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese, and implements a set of state-of-the-art few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compares their performance with fine- Tuning and zero-shotLearning schemes on the newly constructed FewCLUE benchmark. Expand
Language Models are Few-shot Multilingual Learners
General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks andExpand
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3Expand
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
This comprehensive survey paper explains various core concepts like pretraining, Pretraining methods, pretraining tasks, embeddings and downstream adaptation methods, presents a new taxonomy of T-PTLMs and gives brief overview of various benchmarks including both intrinsic and extrinsic. Expand
Accurate, yet inconsistent? Consistency Analysis on Language Understanding Models
Consistency, which refers to the capability of generating the same predictions for semantically similar contexts, is a highly desirable property for a sound language understanding model. AlthoughExpand


OCNLI: Original Chinese Natural Language Inference
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses. Expand
FlauBERT: Unsupervised Language Model Pre-training for French
This paper introduces and shares FlauBERT, a model learned on a very large and heterogeneous French corpus and applies it to diverse NLP tasks and shows that most of the time they outperform other pre-training approaches. Expand
ParsiNLU: A Suite of Language Understanding Challenges for Persian
ParsiNLU is introduced, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc -- and is presented to compare them with human performance, which provides valuable insights into the ability to tackle natural language understanding challenges in Persian. Expand
XNLI: Evaluating Cross-lingual Sentence Representations
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines. Expand
iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA. Expand
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks
Experimental results demonstrate that a hybrid approach of morphological segmentation followed by BPE works best in Korean to/from English machine translation and natural language understanding tasks such as KorNLI, KorSTS, NSMC, and PAWS-X. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options
This work investigates two alternative protocols which automatically create candidate (premise, hypothesis) pairs for annotators to label and concludes that crowdworker writing is still the best known option for entailment data. Expand
Semi-supervised Training Data Generation for Multilingual Question Answering
This work annotate seed QA pairs of small size for Korean language, and designs how such seed can be combined with translated English resources to enable leveraging such resources. Expand
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand