KLEJ: Comprehensive Benchmark for Polish Language Understanding

  title={KLEJ: Comprehensive Benchmark for Polish Language Understanding},
  author={Piotr Rybak and Robert Mroczkowski and Janusz Tracz and Ireneusz Gawlik},
In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online… 

Tables from this paper

LiRo: Benchmark and leaderboard for Romanian language tasks

LiRo, a platform for benchmarking models on the Romanian language on nine standard tasks: text classification, named entity recognition, machine translation, sentiment analysis, POS tagging, dependency parsing, language modelling, question-answering, and semantic textual similarity, is proposed.

LiRo: Benchmark and leaderboard for Romanian language tasks

The LiRo platform is proposed, a platform for benchmarking models on the Romanian language on nine standard tasks: text classification, named entity recognition, machine translation, sentiment analysis, POS tagging, dependency parsing, language modelling, question-answering, and semantic textual similarity.

Pre-training Polish Transformer-based Language Models at Scale

This study presents two language models for Polish based on the popular BERT architecture, one of which was trained on a dataset consisting of over 1 billion polish sentences, or 135GB of raw text, and describes the methodology for collecting the data, preparing the corpus, and pre-training the model.

Evaluation of Transfer Learning for Polish with a Text-to-Text Model

The plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective is presented and proved to be better than the decoder-only equivalent.

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

This paper designs and thoroughly evaluates a pretraining procedure of transferring knowledge from multilingual to monolingual BERT-based models and achieves state-of-the-art results on multiple downstream tasks.

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP-models

This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models, and improves the benchmark toolkit based on jiant framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian.

EENLP: Cross-lingual Eastern European NLP Index

A broad index of NLP resources for Eastern European languages, which, it is hoped, could be helpful for the NLP community; several new hand-crafted cross-lingual datasets focused on Eastern Europe languages, and a sketch evaluation of cross-lingsual transfer learning abilities of several modern multilingual Transformer-based models.

AlephBERT: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level

AlephBERT is presented, a large PLM for Modern Hebrew, trained on larger vocabulary and a larger dataset than any Hebrew PLM before, and a novel neural architecture is introduced that recovers the morphological segments encoded in contextualized embedding vectors.

Mukayese: Turkish NLP Strikes Back

This paper presents Mukayese, a set of NLP benchmarks for the Turkish language that contains several NLP tasks and presents four new benchmarking datasets in Turkish for language modeling, sentence segmentation, and spell checking.

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

This comprehensive survey paper explains various core concepts like pretraining, Pretraining methods, pretraining tasks, embeddings and downstream adaptation methods, presents a new taxonomy of T-PTLMs and gives brief overview of various benchmarks including both intrinsic and extrinsic.



GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

CamemBERT: a Tasty French Language Model

This paper investigates the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating their language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks.

Evaluation of Sentence Representations in Polish

This study introduces two new Polish datasets for evaluating sentence embeddings and provides a comprehensive evaluation of eight sentence representation methods including Polish and multilingual models, showing strengths and weaknesses of specific approaches.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

Tuning Multilingual Transformers for Language-Specific Named Entity Recognition

Our paper addresses the problem of multilingual named entity recognition on the material of 4 languages: Russian, Bulgarian, Czech and Polish. We solve this task using the BERT model. We use a

Polish evaluation dataset for compositional distributional semantics models

The designed procedure is verified on Polish, a fusional language with a relatively free word order, and contributes to building a Polish evaluation dataset, which consists of 10K sentence pairs which are human-annotated for semantic relatedness and entailment.

Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Multi-Level Sentiment Analysis of PolEmo 2.0: Extended Corpus of Multi-Domain Consumer Reviews

An extended version of PolEmo – a corpus of consumer reviews from 4 domains: medicine, hotels, products and school is presented, which explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.