• Corpus ID: 221802461

FarsTail: A Persian Natural Language Inference Dataset

  title={FarsTail: A Persian Natural Language Inference Dataset},
  author={Hossein Amirkhani and Mohammad AzariJafari and Zohreh Pourjafari and Soroush Faridan-Jahromi and Zeinab Kouhkan and Azadeh Amirak},
Natural language inference (NLI) is known as one of the central tasks in natural language processing (NLP) which encapsulates many fundamental aspects of language understanding. With the considerable achievements of data-hungry deep learning methods in NLP tasks, a great amount of effort has been devoted to develop more diverse datasets for different languages. In this paper, we present a new dataset for the NLI task in the Persian language, also known as Farsi, which is one of the dominant… 

Figures and Tables from this paper

OCNLI: Original Chinese Natural Language Inference
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses.
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
This work investigates the cross-lingual transfer abilities of XLM-R for Chinese and English natural language inference (NLI), with a focus on the recent largescale Chinese dataset OCNLI.
Contradiction Detection in Persian Text
A novel rulebase system for identifying semantic contradiction along with a Bert base deep contradiction detection system for Persian texts have been introduced, which outperforms other algorithms on Persian texts.
Persian Natural Language Inference: A Meta-learning approach
A meta-learning approach for inferring natural language in Persian is proposed, which consistently outperforms the baseline approach and investigates the role of task augmentation strategy for forming additional high-quality tasks.
BERT-DRE: BERT with Deep Recursive Encoder for Natural Language Sentence Matching
This paper presents a deep neural architecture, for Natural Language Sentence Matching (NLSM) by adding a deep recursive encoder to BERT so called BERT with Deep Recursive Encoder (BERT-DRE). Our
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen
SILT: Efficient transformer training for inter-lingual inference
ParsiNLU: A Suite of Language Understanding Challenges for Persian
This work introduces ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on, and presents the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compares them with human performance.
SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference
Evidence is found that SML allows to reduce drastically the number of trainable parameters while still achieving state-of-the-art performance, as well as efficiently align multilingual embeddings for Natural Language Inference.
Fake News Detection on Social Media Using A Natural Language Inference Approach
The NLI approach is used to boost several classical and deep machine learning models including Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, k-Nearest Neighbors, Support Vector Machine, BiGRU, and BiLSTM along with different word embedding methods.


XNLI: Evaluating Cross-lingual Sentence Representations
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines.
Transforming Question Answering Datasets Into Natural Language Inference Datasets
This work proposes a new method for automatically deriving NLI datasets from the growing abundance of large-scale question answering datasets, and relies on learning a sentence transformation model which converts question-answer pairs into their declarative forms.
Sentence embeddings in NLI with iterative refinement encoders
This work proposes a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural language Inference.
Natural language inference
This dissertation explores a range of approaches to NLI, beginning with methods which are robust but approximate, and proceeding to progressively more precise approaches, and greatly extends past work in natural logic to incorporate both semantic exclusion and implicativity.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.
A large annotated corpus for learning natural language inference
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
Deep Contextualized Word Representations
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
Enhanced LSTM for Natural Language Inference
This paper presents a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset, and demonstrates that carefully designing sequential inference models based on chain LSTMs can outperform all previous models.
Recent Trends in Deep Learning Based Natural Language Processing [Review Article]
This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.
A Survey of the Usages of Deep Learning for Natural Language Processing
An introduction to the field and a quick overview of deep learning architectures and methods is provided and a discussion of the current state of the art is provided along with recommendations for future research in the field.