• Corpus ID: 234790338

KLUE: Korean Language Understanding Evaluation

@article{Park2021KLUEKL,
  title={KLUE: Korean Language Understanding Evaluation},
  author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Tae Hwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Young-kuk Jeong and Inkwon Lee and Sang-gyu Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and SunKyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice H. Oh and Jung-Woo Ha and Kyunghyun Cho},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.09680}
}
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone… 
LiRo: Benchmark and leaderboard for Romanian language tasks
TLDR
LiRo, a platform for benchmarking models on the Romanian language on nine standard tasks: text classification, named entity recognition, machine translation, sentiment analysis, POS tagging, dependency parsing, language modelling, question-answering, and semantic textual similarity, is proposed.
KOBEST: Korean Balanced Evaluation of Significant Tasks
TLDR
A new benchmark named KoBEST is proposed, which consists of Korean-language downstream tasks that require advanced Korean linguistic knowledge and is purely annotated by humans and thoroughly reviewed to guarantee high data quality.
A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
The recent advances of deep learning have dramatically changed how machine learning, especially in the domain of natural language processing, can be applied to legal domain. However, this shift to
Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models
TLDR
It is found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification, suggesting the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.
BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla
TLDR
This work introduces BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature, and achieves state-of-the-art results outperforming multilingual and monolingual models.
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
TLDR
This work introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese, and implements a set of state-of-the-art few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compares their performance with fine- Tuning and zero-shotLearning schemes on the newly constructed FewCLUE benchmark.
K-EPIC: Entity-Perceived Context Representation in Korean Relation Extraction
Relation Extraction (RE) aims to predict the correct relation between two entities from the given sentence. To obtain the proper relation in Relation Extraction (RE), it is significant to comprehend
Language Models are Few-shot Multilingual Learners
TLDR
It is shown that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are competitive compared to the existing state-of-the-art cross-lingual models and translation models.
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension
TLDR
This paper presents the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans, and finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set.
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
TLDR
The possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface is discussed and the performance benefits of prompt-based learning are shown and how it can be integrated into the prompt engineering pipeline.
...
...

References

SHOWING 1-10 OF 183 REFERENCES
CLUE: A Chinese Language Understanding Evaluation Benchmark
TLDR
The first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark is introduced, an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text.
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
TLDR
New datasets for Korean NLI and STS are constructed and released, dubbed KorNLI and KorSTS, respectively, following previous approaches, which machine-translate existing English training sets and manually translate development and test sets into Korean.
OCNLI: Original Chinese Natural Language Inference
TLDR
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses.
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding
TLDR
The first-ever vast resource for training, evaluation, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks is introduced, releasing baseline models for all twelve tasks, as well as the framework for benchmark evaluation, thus enabling everyone to benchmark their system performances.
FlauBERT: Unsupervised Language Model Pre-training for French
TLDR
This paper introduces and shares FlauBERT, a model learned on a very large and heterogeneous French corpus and applies it to diverse NLP tasks and shows that most of the time they outperform other pre-training approaches.
ParsiNLU: A Suite of Language Understanding Challenges for Persian
TLDR
This work introduces ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on, and presents the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compares them with human performance.
XNLI: Evaluating Cross-lingual Sentence Representations
TLDR
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
TLDR
This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks
TLDR
Experimental results demonstrate that a hybrid approach of morphological segmentation followed by BPE works best in Korean to/from English machine translation and natural language understanding tasks such as KorNLI, KorSTS, NSMC, and PAWS-X.
...
...